Re: Warden: staging error when pushing app


Alexander Lomov <alexander.lomov@...>
 

Hey, Kyle.

Just a quick tip I use to debug failed containers: I comment a content of do_destroy method in /var/vcap/jobs/dea_next/packages/warden/warden/lib/warden/container/linux.rb file to prevent warden removing any containers at all, after that you’ll need to run “sudo /var/vcap/bosh/bin/monit -I restart warden”. This changes can result in danger behavior, so don’t run them on any sort of production.

Probably you can find something useful in my blogpost [1].

[1] http://blog.altoros.com/creating-a-custom-cloud-foundry-buildpack-from-scratch-whats-under-the-hood.html

On Sep 12, 2015, at 12:44 AM, kyle havlovitz <kylehav(a)gmail.com<mailto:kylehav(a)gmail.com>> wrote:

I built all the components on my own and have them running on one VM, with config changed from the bosh defaults as needed (I had already had this working, but with an older CF version (203) and now i'm updating).

It seems like the app and buildpack are getting placed in the container successfully, but the push never gets past the '-----> Downloaded app package (460K)'.
I also tried creating a container with the warden client and it says it was created successfully, then gets deleted by warden soon (about 30s) afterward. I have the 'container_grace_time' set to 300, so I thought it would be longer.

On Fri, Sep 11, 2015 at 5:07 PM, CF Runtime <cfruntime(a)gmail.com<mailto:cfruntime(a)gmail.com>> wrote:
Hi Kyle,

I'm not sure what mean by "running it locally". Can you explain in more detail how you've deployed your CF installation?

Zak + Dan

On Fri, Sep 11, 2015 at 1:12 PM, kyle havlovitz <kylehav(a)gmail.com<mailto:kylehav(a)gmail.com>> wrote:
I also see this error in the DEA logs: {"timestamp":1441999712.400334,"message":"Error copying files out of container: command exited with failure","log_level":"warn","source":"Staging","data":{"app_guid":"207dd67c-415a-4108-a31d-7e82590b1e28","task_id":"01db0928ddf14426b47e21459bf1e57a"},"thread_id":70184860078900,"fiber_id":70184883843040,"process_id":23523,"file":"/opt/cloudfoundry/dea_next/lib/dea/task.rb","lineno":126,"method":"rescue in copy_out_request"}
{"timestamp":1441999712.4004574,"message":"staging.task-log.copying-out","log_level":"info","source":"Staging","data":{"app_guid":"207dd67c-415a-4108-a31d-7e82590b1e28","task_id":"01db0928ddf14426b47e21459bf1e57a","source":"/tmp/staged/logs/staging_task.log","destination":"/tmp/dea_ng/staging/d20150911-23523-ms2lax/staging_task.log"},"thread_id":70184860078900,"fiber_id":70184883400540,"process_id":23523,"file":"/opt/cloudfoundry/dea_next/lib/dea/staging/staging_task.rb","lineno":258,"method":"block in promise_task_log"}

Could it be something wrong with the DEA's directory server? I don't anything in its logs after it starts so maybe it isn't being used.

On Fri, Sep 11, 2015 at 1:55 PM, kyle havlovitz <kylehav(a)gmail.com<mailto:kylehav(a)gmail.com>> wrote:
I'm not using bosh, just trying to run it locally. I can post the dea and warden configs though:

dea.yml:
base_dir: /tmp/dea_ng

domain: local.example.com<http://local.example.com/>

logging:
file: /opt/cloudfoundry/logs/dea_ng.log
level: debug

loggregator:
router: "127.0.0.1:3456<http://127.0.0.1:3456/>"
shared_secret: "secret"

resources:
memory_mb: 8000
memory_overcommit_factor: 1
disk_mb: 40960
disk_overcommit_factor: 1

nats_servers:
- nats://127.0.0.1:4222<http://127.0.0.1:4222/>

pid_filename: /tmp/dea_ng.pid

warden_socket: /tmp/warden.sock

evacuation_delay_secs: 10

default_health_check_timeout: 120

index: 0

intervals:
heartbeat: 10
advertise: 5
router_register_in_seconds: 20

staging:
enabled: true
memory_limit_mb: 4096
disk_limit_mb: 6144
disk_inode_limit: 200000
cpu_limit_shares: 512
max_staging_duration: 900

instance:
disk_inode_limit: 200000
memory_to_cpu_share_ratio: 8
max_cpu_share_limit: 256
min_cpu_share_limit: 1

dea_ruby: /usr/bin/ruby

# For Go-based directory server
directory_server:
protocol: 'http'
v1_port: 4385
v2_port: 5678
file_api_port: 1234
streaming_timeout: 10
logging:
file: /opt/cloudfoundry/logs/dea_dirserver.log
level: debug

stacks:
- name: cflinuxfs2
package_path: /var/warden/rootfs

placement_properties:
zone: "zone"

warden test_vm.yml:

server:
container_klass: Warden::Container::Linux

# Wait this long before destroying a container, after the last client
# referencing it disconnected. The timer is cancelled when during this
# period, another client references the container.
#
# Clients can be forced to specify this setting by setting the
# server-wide variable to an invalid value:
# container_grace_time: invalid
#
# The grace time can be disabled by setting it to nil:
# container_grace_time: ~
#
container_grace_time: 300

unix_domain_permissions: 0777
unix_domain_path: /tmp/warden.sock

# Specifies the path to the base chroot used as the read-only root
# filesystem
container_rootfs_path: /var/warden/rootfs

# Specifies the path to the parent directory under which all containers
# will live.
container_depot_path: /var/warden/containers

# See getrlimit(2) for details. Integer values are passed verbatim.
container_rlimits:
core: 0

quota:
disk_quota_enabled: false

allow_nested_warden: false

health_check_server:
port: 2345

logging:
file: /opt/cloudfoundry/logs/warden.log
level: debug2

network:
# Use this /30 network as offset for the network pool.
pool_start_address: 10.254.0.0

# Pool this many /30 networks.
pool_size: 256

# Interface MTU size
# (for OpenStack use 1454 to avoid problems with rubygems with GRE tunneling)
mtu: 1500

user:
pool_start_uid: 11000
pool_size: 256

This is all using the latest CFv217

On Fri, Sep 11, 2015 at 1:31 PM, CF Runtime <cfruntime(a)gmail.com<mailto:cfruntime(a)gmail.com>> wrote:
Hey Kyle,

Can we take a look at your deployment manifest (with all the secrets redacted)?

Zak + Dan, CF OSS Integration team

On Fri, Sep 11, 2015 at 8:55 AM, kyle havlovitz <kylehav(a)gmail.com<mailto:kylehav(a)gmail.com>> wrote:
I'm getting an error pushing any app during the staging step. The cf logs returns only this:
2015-09-11T15:24:24.33+0000 [API] OUT Created app with guid 47efa472-e2d5-400b-9135-b1a1dbe3ba88
2015-09-11T15:24:24.41+0000 [API] OUT Updated app with guid 47efa472-e2d5-400b-9135-b1a1dbe3ba88 ({"route"=>"5737c5f5-b017-43da-9013-2b6fe7db03f7"})
2015-09-11T15:24:29.54+0000 [DEA/0] OUT Got staging request for app with id 47efa472-e2d5-400b-9135-b1a1dbe3ba88
2015-09-11T15:24:30.71+0000 [API] OUT Updated app with guid 47efa472-e2d5-400b-9135-b1a1dbe3ba88 ({"state"=>"STARTED"})
2015-09-11T15:24:30.76+0000 [STG/0] OUT -----> Downloaded app package (4.0K)
2015-09-11T15:25:06.00+0000 [API] ERR encountered error: Staging error: failed to stage application:
2015-09-11T15:25:06.00+0000 [API] ERR Script exited with status 1

In the warden logs, there are a few suspect messages:
{
"timestamp": 1441985105.8883495,
"message": "Exited with status 1 (35.120s): [[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\", \"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"/var/warden/containers/18vf956il5v/bin/iomux-link\", \"-w\", \"/var/warden/containers/18vf956il5v/jobs/8/cursors\", \"/var/warden/containers/18vf956il5v/jobs/8\"]",
"log_level": "warn",
"source": "Warden::Container::Linux",
"data": {
"handle": "18vf956il5v",
"stdout": "",
"stderr": ""
},
"thread_id": 69890836968240,
"fiber_id": 69890849112480,
"process_id": 17063,
"file": "/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb",
"lineno": 135,
"method": "set_deferred_success"
}

{
"timestamp": 1441985105.94083,
"message": "Exited with status 23 (0.023s): [[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\", \"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"rsync\", \"-e\", \"/var/warden/containers/18vf956il5v/bin/wsh --socket /var/warden/containers/18vf956il5v/run/wshd.sock --rsh\", \"-r\", \"-p\", \"--links\", \"vcap(a)container:/tmp/staged/staging_info.yml\", \"/tmp/dea_ng/staging/d20150911-17093-1amg6y8\"]",
"log_level": "warn",
"source": "Warden::Container::Linux",
"data": {
"handle": "18vf956il5v",
"stdout": "",
"stderr": "rsync: link_stat \"/tmp/staged/staging_info.yml\" failed: No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.0]\nrsync: [Receiver] write error: Broken pipe (32)\n"
},
"thread_id": 69890836968240,
"fiber_id": 69890849112480,
"process_id": 17063,
"file": "/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb",
"lineno": 135,
"method": "set_deferred_success"
}

{
"timestamp": 1441985106.0086887,
"message": "Killing oom-notifier process",
"log_level": "debug",
"source": "Warden::Container::Features::MemLimit::OomNotifier",
"data": {

},
"thread_id": 69890836968240,
"fiber_id": 69890848620580,
"process_id": 17063,
"file": "/opt/cloudfoundry/warden/warden/lib/warden/container/features/mem_limit.rb",
"lineno": 51,
"method": "kill"
}

{
"timestamp": 1441985106.0095143,
"message": "Exited with status 0 (35.427s): [[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\", \"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"/opt/cloudfoundry/warden/warden/src/oom/oom\", \"/tmp/warden/cgroup/memory/instance-18vf956il5v\"]",
"log_level": "warn",
"source": "Warden::Container::Features::MemLimit::OomNotifier",
"data": {
"stdout": "",
"stderr": ""
},
"thread_id": 69890836968240,
"fiber_id": 69890849112480,
"process_id": 17063,
"file": "/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb",
"lineno": 135,
"method": "set_deferred_success"
}


Obviously something is misconfigured, but I'm not sure what. I don't know why the out of memory thing is appearing as the memory being used by the test app i've pushed is tiny (64M app with staticfile buildpack) and the dea config has resource.memory_mb set to 8 gigs and staging.memory_limit_mb set to 1 gigs. Is there some config I'm lacking that's causing this to fail?

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.