Issue with crashing Windows apps on Diego


Aaron Huber
 

We've started testing Windows apps on Diego in our lab and everything appears
to be working correctly except for occasional crashes of the .NET apps. The
frequency is very random - some times I can go a day or more without any and
then I'll get many in a day. As far as I can tell from the logs the only
issue is that the healthcheck in the lifecycle is timing out due to
exceeding the 1 second wait here:

https://github.com/cloudfoundry/windows_app_lifecycle/blob/master/Healthcheck/Program.cs#L29

Our test environment is definitely running on very slow storage so it
doesn't surprise me that it gets a bit slow sometimes, but I'm worried that
taking more than 1 second for a simple HTTP request to respond seems
unlikely. I've looked through the logs and can't find any indication of
root cause other than the healthcheck returning exit code 1 instead of zero:

{"timestamp":"1454113322.534542084","source":"garden-windows","message":"garden-windows.garden-server.run.spawned","log_level":1,"data":{"handle":"c41ecf17-6e8c-4b50-a103-4e32323ef53e-bdfa601f-0a44-48fd-8d05-e5551ac9af7a-3a193046-43ed-4811-7bc4-3595809a409c","id":"5920","session":"1.104644","spec":{"Path":"/tmp/lifecycle/healthcheck","Dir":"","User":"vcap","Limits":{"nofile":1024},"TTY":null}}}

{"timestamp":"1454113324.545698404","source":"garden-windows","message":"garden-windows.garden-server.run.exited","log_level":1,"data":{"handle":"c41ecf17-6e8c-4b50-a103-4e32323ef53e-bdfa601f-0a44-48fd-8d05-e5551ac9af7a-3a193046-43ed-4811-7bc4-3595809a409c","id":"5920","session":"1.104644","status":1}}

{"timestamp":"1454113324.987732887","source":"garden-windows","message":"garden-windows.garden-server.destroy.destroyed","log_level":1,"data":{"handle":"c41ecf17-6e8c-4b50-a103-4e32323ef53e-bdfa601f-0a44-48fd-8d05-e5551ac9af7a-3a193046-43ed-4811-7bc4-3595809a409c","session":"1.104647"}}

There are no other event log messages at the same time to indicate anything
is wrong on the system. Theoretically I could just try increasing the wait
time on the healthcheck but I'd love to get some more data on exactly what's
going on. Anyone have any ideas?

Aaron Huber
Intel Corporation





--
View this message in context: http://cf-dev.70369.x6.nabble.com/Issue-with-crashing-Windows-apps-on-Diego-tp3586.html
Sent from the CF Dev mailing list archive at Nabble.com.

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.