Re: Job is not running after update - agent/monit race issue?


Prysmakou Aliaksandr <aliaksandr.prysmakou@...>
 

Hi Danny, guys
I may confirm the issue. We faced with it many times. Going to collect more info on next occurrence.

On Thu, Jun 4, 2015 at 5:31 PM, Danny Berger <dpb587(a)gmail.com<mailto:dpb587(a)gmail.com>> wrote:
Frequently when doing a deploy (happens in multiple deployments) a job will randomly fail with "job/0 is not running after update" for no logical reason. I can just rerun `bosh deploy` and it will succeed on that job and move onto the next job for update (which might also fail). Alternatively, I can SSH in and monit will show one or more processes as "not monitored", yet if I run `monit start all` it does start the remaining processes without fail. Looking into this behavior more today, I think it might be some strange interaction between bosh-agent and monit.

Alex Prysmakou / Altoros
Tel: (617) 841-2121 ext. 5161 | Toll free: 855-ALTOROS
Skype: aliaksandr.prysmakou
www.altoros.com<http://www.altoros.com/> | blog.altoros.com<http://blog.altoros.com/> | twitter.com/altoros<http://twitter.com/altoros>

Join cf-bosh@lists.cloudfoundry.org to automatically receive all group messages.