Re: Resurrector meltdown causes problem on deployment with only one job


Danny Berger
 

If you really, really need the functionality, you could try setting a value
greater than 1 (e.g. 2); by code it seems like that hack might help.
Wouldn't recommend it in any sort of non-test environment though - it
effectively disables the meltdown safety net which might end up causing
problems for your larger deployments.

Danny

On Tue, May 10, 2016 at 2:08 AM, Meng, Xiangyi <xiangyi.meng(a)emc.com> wrote:

Hi,



We enabled bosh resurrector on micro bosh director. But we found recreate
action never happened for the deployment which has only one job.



I found below description from bosh doc:



*BOSH uses the BOSH Resurrector to help it recover from many issues. The
Resurrector automatically instructs the BOSH Director to rebuild
unresponsive VMs unless the system is in meltdown.*



*Meltdown occurs when the number of unresponsive VM alerts within a
specified time period exceeds a specified threshold. This threshold is a
percentage of the total number of VMs in the deployment. You specify the
time_threshold and percent_threshold properties in your manifest.*



*For example, in a deployment with 40 VMs, percent_threshold set to 20%,
and time_threshold set to 60 seconds, automatic recovery fails if the
Resurrector receives eight or more unresponsive VM alerts within 60
seconds.*



But even we set percent_threshold to 1, meltdown still occurred and the
unresponsive VM never got recreated.



My micro bosh version is 1.2732.0.



Does anyone know how to solve this problem? Any help will be appreciated.



Regards,

Maggie
--
Danny Berger

Join cf-bosh@lists.cloudfoundry.org to automatically receive all group messages.