Re: Warden: staging error when pushing app


kyle havlovitz <kylehav@...>
 

I'm not sure if the "rsync: link_stat \"/tmp/staged/staging_info.yml\"
failed: No such file or directory (2)" is the cause of the staging failure
or just a side effect of the staging not happening correctly. Does anyone
have any ideas on why this is happening?

On Fri, Sep 11, 2015 at 5:44 PM, kyle havlovitz <kylehav(a)gmail.com> wrote:

I built all the components on my own and have them running on one VM, with
config changed from the bosh defaults as needed (I had already had this
working, but with an older CF version (203) and now i'm updating).

It seems like the app and buildpack are getting placed in the container
successfully, but the push never gets past the '-----> Downloaded app
package (460K)'.
I also tried creating a container with the warden client and it says it
was created successfully, then gets deleted by warden soon (about 30s)
afterward. I have the 'container_grace_time' set to 300, so I thought it
would be longer.

On Fri, Sep 11, 2015 at 5:07 PM, CF Runtime <cfruntime(a)gmail.com> wrote:

Hi Kyle,

I'm not sure what mean by "running it locally". Can you explain in more
detail how you've deployed your CF installation?

Zak + Dan

On Fri, Sep 11, 2015 at 1:12 PM, kyle havlovitz <kylehav(a)gmail.com>
wrote:

I also see this error in the DEA logs:
{"timestamp":1441999712.400334,"message":"Error copying files out of
container: command exited with
failure","log_level":"warn","source":"Staging","data":{"app_guid":"207dd67c-415a-4108-a31d-7e82590b1e28","task_id":"01db0928ddf14426b47e21459bf1e57a"},"thread_id":70184860078900,"fiber_id":70184883843040,"process_id":23523,"file":"/opt/cloudfoundry/dea_next/lib/dea/task.rb","lineno":126,"method":"rescue
in copy_out_request"}
{"timestamp":1441999712.4004574,"message":"staging.task-log.copying-out","log_level":"info","source":"Staging","data":{"app_guid":"207dd67c-415a-4108-a31d-7e82590b1e28","task_id":"01db0928ddf14426b47e21459bf1e57a","source":"/tmp/staged/logs/staging_task.log","destination":"/tmp/dea_ng/staging/d20150911-23523-ms2lax/staging_task.log"},"thread_id":70184860078900,"fiber_id":70184883400540,"process_id":23523,"file":"/opt/cloudfoundry/dea_next/lib/dea/staging/staging_task.rb","lineno":258,"method":"block
in promise_task_log"}

Could it be something wrong with the DEA's directory server? I don't
anything in its logs after it starts so maybe it isn't being used.

On Fri, Sep 11, 2015 at 1:55 PM, kyle havlovitz <kylehav(a)gmail.com>
wrote:

I'm not using bosh, just trying to run it locally. I can post the dea
and warden configs though:

dea.yml:

base_dir: /tmp/dea_ng


domain: local.example.com

logging:
file: /opt/cloudfoundry/logs/dea_ng.log

level: debug


loggregator:
router: "127.0.0.1:3456"

shared_secret: "secret"


resources:
memory_mb: 8000

memory_overcommit_factor: 1

disk_mb: 40960

disk_overcommit_factor: 1


nats_servers:
- nats://127.0.0.1:4222


pid_filename: /tmp/dea_ng.pid

warden_socket: /tmp/warden.sock

evacuation_delay_secs: 10

default_health_check_timeout: 120

index: 0

intervals:
heartbeat: 10

advertise: 5

router_register_in_seconds: 20


staging:
enabled: true

memory_limit_mb: 4096

disk_limit_mb: 6144

disk_inode_limit: 200000

cpu_limit_shares: 512

max_staging_duration: 900


instance:
disk_inode_limit: 200000

memory_to_cpu_share_ratio: 8

max_cpu_share_limit: 256

min_cpu_share_limit: 1


dea_ruby: /usr/bin/ruby

# For Go-based directory server
directory_server:

protocol: 'http'

v1_port: 4385

v2_port: 5678

file_api_port: 1234

streaming_timeout: 10

logging:

file: /opt/cloudfoundry/logs/dea_dirserver.log

level: debug


stacks:
- name: cflinuxfs2

package_path: /var/warden/rootfs


placement_properties:
zone: "zone"

warden test_vm.yml:

server:

container_klass: Warden::Container::Linux


# Wait this long before destroying a container, after the last client
# referencing it disconnected. The timer is cancelled when during this

# period, another client references the container.

#

# Clients can be forced to specify this setting by setting the

# server-wide variable to an invalid value:

# container_grace_time: invalid

#

# The grace time can be disabled by setting it to nil:

# container_grace_time: ~

#

container_grace_time: 300


unix_domain_permissions: 0777
unix_domain_path: /tmp/warden.sock


# Specifies the path to the base chroot used as the read-only root
# filesystem

container_rootfs_path: /var/warden/rootfs


# Specifies the path to the parent directory under which all
containers
# will live.

container_depot_path: /var/warden/containers


# See getrlimit(2) for details. Integer values are passed verbatim.
container_rlimits:

core: 0


quota:
disk_quota_enabled: false


allow_nested_warden: false

health_check_server:
port: 2345


logging:
file: /opt/cloudfoundry/logs/warden.log

level: debug2


network:
# Use this /30 network as offset for the network pool.

pool_start_address: 10.254.0.0


# Pool this many /30 networks.
pool_size: 256


# Interface MTU size
# (for OpenStack use 1454 to avoid problems with rubygems with GRE
tunneling)
mtu: 1500


user:
pool_start_uid: 11000

pool_size: 256


This is all using the latest CFv217

On Fri, Sep 11, 2015 at 1:31 PM, CF Runtime <cfruntime(a)gmail.com>
wrote:

Hey Kyle,

Can we take a look at your deployment manifest (with all the secrets
redacted)?

Zak + Dan, CF OSS Integration team

On Fri, Sep 11, 2015 at 8:55 AM, kyle havlovitz <kylehav(a)gmail.com>
wrote:

I'm getting an error pushing any app during the staging step. The cf
logs returns only this:

2015-09-11T15:24:24.33+0000 [API] OUT Created app with guid
47efa472-e2d5-400b-9135-b1a1dbe3ba88
2015-09-11T15:24:24.41+0000 [API] OUT Updated app with guid
47efa472-e2d5-400b-9135-b1a1dbe3ba88
({"route"=>"5737c5f5-b017-43da-9013-2b6fe7db03f7"})
2015-09-11T15:24:29.54+0000 [DEA/0] OUT Got staging request for
app with id 47efa472-e2d5-400b-9135-b1a1dbe3ba88
2015-09-11T15:24:30.71+0000 [API] OUT Updated app with guid
47efa472-e2d5-400b-9135-b1a1dbe3ba88 ({"state"=>"STARTED"})
2015-09-11T15:24:30.76+0000 [STG/0] OUT -----> Downloaded app
package (4.0K)
2015-09-11T15:25:06.00+0000 [API] ERR encountered error:
Staging error: failed to stage application:
2015-09-11T15:25:06.00+0000 [API] ERR Script exited with
status 1
In the warden logs, there are a few suspect messages:

{
"timestamp": 1441985105.8883495,

"message": "Exited with status 1 (35.120s):
[[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\",
\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"],
\"/var/warden/containers/18vf956il5v/bin/iomux-link\", \"-w\",
\"/var/warden/containers/18vf956il5v/jobs/8/cursors\",
\"/var/warden/containers/18vf956il5v/jobs/8\"]",
"log_level": "warn",

"source": "Warden::Container::Linux",

"data": {

"handle": "18vf956il5v",

"stdout": "",

"stderr": ""

},

"thread_id": 69890836968240,

"fiber_id": 69890849112480,

"process_id": 17063,

"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb",
"lineno": 135,

"method": "set_deferred_success"

}



{

"timestamp": 1441985105.94083,

"message": "Exited with status 23 (0.023s):
[[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\",
\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"], \"rsync\",
\"-e\", \"/var/warden/containers/18vf956il5v/bin/wsh --socket
/var/warden/containers/18vf956il5v/run/wshd.sock --rsh\", \"-r\", \"-p\",
\"--links\", \"vcap(a)container:/tmp/staged/staging_info.yml\",
\"/tmp/dea_ng/staging/d20150911-17093-1amg6y8\"]",
"log_level": "warn",

"source": "Warden::Container::Linux",

"data": {

"handle": "18vf956il5v",

"stdout": "",

"stderr": "rsync: link_stat \"/tmp/staged/staging_info.yml\"
failed: No such file or directory (2)\nrsync error: some files/attrs were
not transferred (see previous errors) (code 23) at main.c(1655)
[Receiver=3.1.0]\nrsync: [Receiver] write error: Broken pipe (32)\n"
},

"thread_id": 69890836968240,

"fiber_id": 69890849112480,

"process_id": 17063,

"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb",
"lineno": 135,

"method": "set_deferred_success"

}



{

"timestamp": 1441985106.0086887,

"message": "Killing oom-notifier process",

"log_level": "debug",

"source": "Warden::Container::Features::MemLimit::OomNotifier",

"data": {



},

"thread_id": 69890836968240,

"fiber_id": 69890848620580,

"process_id": 17063,

"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/features/mem_limit.rb",
"lineno": 51,

"method": "kill"

}



{

"timestamp": 1441985106.0095143,

"message": "Exited with status 0 (35.427s):
[[\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\",
\"/opt/cloudfoundry/warden/warden/src/closefds/closefds\"],
\"/opt/cloudfoundry/warden/warden/src/oom/oom\",
\"/tmp/warden/cgroup/memory/instance-18vf956il5v\"]",
"log_level": "warn",

"source": "Warden::Container::Features::MemLimit::OomNotifier",

"data": {

"stdout": "",

"stderr": ""

},

"thread_id": 69890836968240,

"fiber_id": 69890849112480,

"process_id": 17063,

"file":
"/opt/cloudfoundry/warden/warden/lib/warden/container/spawn.rb",
"lineno": 135,

"method": "set_deferred_success"

}




Obviously something is misconfigured, but I'm not sure what. I don't
know why the out of memory thing is appearing as the memory being used by
the test app i've pushed is tiny (64M app with staticfile buildpack) and
the dea config has resource.memory_mb set to 8 gigs and
staging.memory_limit_mb set to 1 gigs. Is there some config I'm lacking
that's causing this to fail?

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.