I think that the DEA will reap any warden containers it finds every 30 seconds. The containers in a grace period are containers it is still keeping track of.
We're not sure about the error you're getting. If you provide the warden and dea logs we might be able to trace through what is happening.
Joseph & Dan OSS Release Integration Team
toggle quoted message
Show quoted text
On Fri, Sep 11, 2015 at 2:44 PM, kyle havlovitz <kylehav(a)gmail.com> wrote: I built all the components on my own and have them running on one VM, with config changed from the bosh defaults as needed (I had already had this working, but with an older CF version (203) and now i'm updating).
It seems like the app and buildpack are getting placed in the container successfully, but the push never gets past the '-----> Downloaded app package (460K)'. I also tried creating a container with the warden client and it says it was created successfully, then gets deleted by warden soon (about 30s) afterward. I have the 'container_grace_time' set to 300, so I thought it would be longer.
On Fri, Sep 11, 2015 at 5:07 PM, CF Runtime <cfruntime(a)gmail.com> wrote:
Hi Kyle,
I'm not sure what mean by "running it locally". Can you explain in more detail how you've deployed your CF installation?
Zak + Dan
On Fri, Sep 11, 2015 at 1:12 PM, kyle havlovitz <kylehav(a)gmail.com> wrote:
I also see this error in the DEA logs: {"timestamp":1441999712.400334,"message":"Error copying files out of container: command exited with failure","log_level":"warn","source":"Staging","data":{"app_guid":"207dd67c-415a-4108-a31d-7e82590b1e28","task_id":"01db0928ddf14426b47e21459bf1e57a"},"thread_id":70184860078900,"fiber_id":70184883843040,"process_id":23523,"file":"/opt/cloudfoundry/dea_next/lib/dea/task.rb","lineno":126,"method":"rescue in copy_out_request"} {"timestamp":1441999712.4004574,"message":"staging.task-log.copying-out","log_level":"info","source":"Staging","data":{"app_guid":"207dd67c-415a-4108-a31d-7e82590b1e28","task_id":"01db0928ddf14426b47e21459bf1e57a","source":"/tmp/staged/logs/staging_task.log","destination":"/tmp/dea_ng/staging/d20150911-23523-ms2lax/staging_task.log"},"thread_id":70184860078900,"fiber_id":70184883400540,"process_id":23523,"file":"/opt/cloudfoundry/dea_next/lib/dea/staging/staging_task.rb","lineno":258,"method":"block in promise_task_log"}
Could it be something wrong with the DEA's directory server? I don't anything in its logs after it starts so maybe it isn't being used.
On Fri, Sep 11, 2015 at 1:55 PM, kyle havlovitz <kylehav(a)gmail.com> wrote:
I'm not using bosh, just trying to run it locally. I can post the dea and warden configs though:
dea.yml:
base_dir: /tmp/dea_ng
domain: local.example.com
logging: file: /opt/cloudfoundry/logs/dea_ng.log
level: debug
loggregator: router: "127.0.0.1:3456"
shared_secret: "secret"
resources: memory_mb: 8000
memory_overcommit_factor: 1
disk_mb: 40960
disk_overcommit_factor: 1
nats_servers: - nats://127.0.0.1:4222
pid_filename: /tmp/dea_ng.pid
warden_socket: /tmp/warden.sock
evacuation_delay_secs: 10
default_health_check_timeout: 120
index: 0
intervals: heartbeat: 10
advertise: 5
router_register_in_seconds: 20
staging: enabled: true
memory_limit_mb: 4096
disk_limit_mb: 6144
disk_inode_limit: 200000
cpu_limit_shares: 512
max_staging_duration: 900
instance: disk_inode_limit: 200000
memory_to_cpu_share_ratio: 8
max_cpu_share_limit: 256
min_cpu_share_limit: 1
dea_ruby: /usr/bin/ruby
# For Go-based directory server directory_server:
protocol: 'http'
v1_port: 4385
v2_port: 5678
file_api_port: 1234
streaming_timeout: 10
logging:
file: /opt/cloudfoundry/logs/dea_dirserver.log
level: debug
stacks: - name: cflinuxfs2
package_path: /var/warden/rootfs
placement_properties: zone: "zone"
warden test_vm.yml:
server:
container_klass: Warden::Container::Linux
# Wait this long before destroying a container, after the last client # referencing it disconnected. The timer is cancelled when during this
# period, another client references the container.
#
# Clients can be forced to specify this setting by setting the
# server-wide variable to an invalid value:
# container_grace_time: invalid
#
# The grace time can be disabled by setting it to nil:
# container_grace_time: ~
#
container_grace_time: 300
unix_domain_permissions: 0777 unix_domain_path: /tmp/warden.sock
# Specifies the path to the base chroot used as the read-only root # filesystem
container_rootfs_path: /var/warden/rootfs
# Specifies the path to the parent directory under which all containers # will live.
container_depot_path: /var/warden/containers
# See getrlimit(2) for details. Integer values are passed verbatim. container_rlimits:
core: 0
quota: disk_quota_enabled: false
allow_nested_warden: false
health_check_server: port: 2345
logging: file: /opt/cloudfoundry/logs/warden.log
level: debug2
network: # Use this /30 network as offset for the network pool.
pool_start_address: 10.254.0.0
# Pool this many /30 networks. pool_size: 256
# Interface MTU size # (for OpenStack use 1454 to avoid problems with rubygems with GRE
tunneling) mtu: 1500
user: pool_start_uid: 11000
pool_size: 256
This is all using the latest CFv217
On Fri, Sep 11, 2015 at 1:31 PM, CF Runtime <cfruntime(a)gmail.com> wrote:
Hey Kyle,
Can we take a look at your deployment manifest (with all the secrets redacted)?
Zak + Dan, CF OSS Integration team
On Fri, Sep 11, 2015 at 8:55 AM, kyle havlovitz <kylehav(a)gmail.com> wrote:
I'm getting an error pushing any app during the staging step. The cf logs returns only this:
2015-09-11T15:24:24.33+0000 [API] OUT Created app with guid
47efa472-e2d5-400b-9135-b1a1dbe3ba88 2015-09-11T15:24:24.41+0000 [API] OUT Updated app with guid
47efa472-e2d5-400b-9135-b1a1dbe3ba88 ({"route"=>"5737c5f5-b017-43da-9013-2b6fe7db03f7"}) 2015-09-11T15:24:29.54+0000 [DEA/0] OUT Got staging request for
app with id 47efa472-e2d5-400b-9135-b1a1dbe3ba88 2015-09-11T15:24:30.71+0000 [API] OUT Updated app with guid
47efa472-e2d5-400b-9135-b1a1dbe3ba88 ({"state"=>"STARTED"}) 2015-09-11T15:24:30.76+0000 [STG/0] OUT -----> Downloaded app
package (4.0K) 2015-09-11T15:25:06.00+0000 [API] ERR encountered error:
Staging error: failed to stage application: 2015-09-11T15:25:06.00+0000 [API] ERR Script exited with
status 1 In the warden logs, there are a few suspect messages:
{ "timestamp": 1441985105.8883495,
"message": "Exited with status 1 (35.120s):
[[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\", \"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"/var/warden/containers/18vf956il5v/bin/iomux-link\", \"-w\", \"/var/warden/containers/18vf956il5v/jobs/8/cursors\", \"/var/warden/containers/18vf956il5v/jobs/8\"]", "log_level": "warn",
"source": "Warden::Container::Linux",
"data": {
"handle": "18vf956il5v",
"stdout": "",
"stderr": ""
},
"thread_id": 69890836968240,
"fiber_id": 69890849112480,
"process_id": 17063,
"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb", "lineno": 135,
"method": "set_deferred_success"
}
{
"timestamp": 1441985105.94083,
"message": "Exited with status 23 (0.023s):
[[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\", \"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"rsync\", \"-e\", \"/var/warden/containers/18vf956il5v/bin/wsh --socket /var/warden/containers/18vf956il5v/run/wshd.sock --rsh\", \"-r\", \"-p\", \"--links\", \"vcap(a)container:/tmp/staged/staging_info.yml\", \"/tmp/dea_ng/staging/d20150911-17093-1amg6y8\"]", "log_level": "warn",
"source": "Warden::Container::Linux",
"data": {
"handle": "18vf956il5v",
"stdout": "",
"stderr": "rsync: link_stat \"/tmp/staged/staging_info.yml\"
failed: No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.0]\nrsync: [Receiver] write error: Broken pipe (32)\n" },
"thread_id": 69890836968240,
"fiber_id": 69890849112480,
"process_id": 17063,
"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb", "lineno": 135,
"method": "set_deferred_success"
}
{
"timestamp": 1441985106.0086887,
"message": "Killing oom-notifier process",
"log_level": "debug",
"source": "Warden::Container::Features::MemLimit::OomNotifier",
"data": {
},
"thread_id": 69890836968240,
"fiber_id": 69890848620580,
"process_id": 17063,
"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/features/mem_limit.rb", "lineno": 51,
"method": "kill"
}
{
"timestamp": 1441985106.0095143,
"message": "Exited with status 0 (35.427s):
[[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\", \"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"/opt/cloudfoundry/warden/warden/src/oom/oom\", \"/tmp/warden/cgroup/memory/instance-18vf956il5v\"]", "log_level": "warn",
"source": "Warden::Container::Features::MemLimit::OomNotifier",
"data": {
"stdout": "",
"stderr": ""
},
"thread_id": 69890836968240,
"fiber_id": 69890849112480,
"process_id": 17063,
"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb", "lineno": 135,
"method": "set_deferred_success"
}
Obviously something is misconfigured, but I'm not sure what. I don't know why the out of memory thing is appearing as the memory being used by the test app i've pushed is tiny (64M app with staticfile buildpack) and the dea config has resource.memory_mb set to 8 gigs and staging.memory_limit_mb set to 1 gigs. Is there some config I'm lacking that's causing this to fail?
|