BOSH recreates VM


Stanley Shen <meteorping@...>
 

Hello, all

For each bosh managed VM, there is a bosh agent running there.
If bosh cannot talk to bosh agent for some time, it considers the VM is gone and will recreate the VM based on deployment files.

We are running a performance testing on a VM, and the VM has been recreated several times during the testing because the processed used in performance testing cost almost 100% CPU, so bosh cannot get the status from bosh agent continuously.

The "recreation" is very helpful for bosh to manage the VM automatically, especially the VM is really down.
But in my case it's not convenient because the VM is alive but just very slow.

Is there a way to configure the interval for checking bosh agent, how many times to try on it to make it more intelligent?
I mean how bosh handle this case if the VM is in heavy load?
Disable the "recreation" can be a solution but it's not good I think in case the VM is really gone.

Any experience on this before?
Thanks in advance.


Marco Voelz
 

Dear Stanley,

some days ago I opened a PR on bosh-agent [1] to send 2 heartbeats per minute instead of only one. I agree that the resurrector kicking in after a single heartbeat has been missed might be a bit problematic. Maybe that already helps your case.

Warm regards
Marco

[1] https://github.com/cloudfoundry/bosh-agent/pull/95

-----Original Message-----
From: Stanley Shen <meteorping(a)gmail.com>
Reply-To: "Discussions about the Cloud Foundry BOSH project." <cf-bosh(a)lists.cloudfoundry.org>
Date: Tuesday 13 September 2016 at 10:04
To: "cf-bosh(a)lists.cloudfoundry.org" <cf-bosh(a)lists.cloudfoundry.org>
Subject: [cf-bosh] BOSH recreates VM

Hello, all

For each bosh managed VM, there is a bosh agent running there.
If bosh cannot talk to bosh agent for some time, it considers the VM is gone and will recreate the VM based on deployment files.

We are running a performance testing on a VM, and the VM has been recreated several times during the testing because the processed used in performance testing cost almost 100% CPU, so bosh cannot get the status from bosh agent continuously.

The "recreation" is very helpful for bosh to manage the VM automatically, especially the VM is really down.
But in my case it's not convenient because the VM is alive but just very slow.

Is there a way to configure the interval for checking bosh agent, how many times to try on it to make it more intelligent?
I mean how bosh handle this case if the VM is in heavy load?
Disable the "recreation" can be a solution but it's not good I think in case the VM is really gone.

Any experience on this before?
Thanks in advance.


Ronak Banka
 

On Tue, Sep 13, 2016 at 5:15 PM, Voelz, Marco <marco.voelz(a)sap.com> wrote:


Dear Stanley,

some days ago I opened a PR on bosh-agent [1] to send 2 heartbeats per
minute instead of only one. I agree that the resurrector kicking in after a
single heartbeat has been missed might be a bit problematic. Maybe that
already helps your case.

Warm regards
Marco

[1] https://github.com/cloudfoundry/bosh-agent/pull/95

-----Original Message-----
From: Stanley Shen <meteorping(a)gmail.com>
Reply-To: "Discussions about the Cloud Foundry BOSH project." <
cf-bosh(a)lists.cloudfoundry.org>
Date: Tuesday 13 September 2016 at 10:04
To: "cf-bosh(a)lists.cloudfoundry.org" <cf-bosh(a)lists.cloudfoundry.org>
Subject: [cf-bosh] BOSH recreates VM

Hello, all

For each bosh managed VM, there is a bosh agent running there.
If bosh cannot talk to bosh agent for some time, it considers the VM
is gone and will recreate the VM based on deployment files.

We are running a performance testing on a VM, and the VM has been
recreated several times during the testing because the processed used in
performance testing cost almost 100% CPU, so bosh cannot get the status
from bosh agent continuously.

The "recreation" is very helpful for bosh to manage the VM
automatically, especially the VM is really down.
But in my case it's not convenient because the VM is alive but just
very slow.

Is there a way to configure the interval for checking bosh agent, how
many times to try on it to make it more intelligent?
I mean how bosh handle this case if the VM is in heavy load?
Disable the "recreation" can be a solution but it's not good I think
in case the VM is really gone.

Any experience on this before?
Thanks in advance.




Lukas Lehner <weblehner@...>
 

what is a knob or knobs?

I think you don't mean that
http://www.urbandictionary.com/define.php?term=Knob

On Tue, Sep 13, 2016 at 2:10 PM, ronak banka <ronakbanka.cse(a)gmail.com>
wrote:

Stanley ,

For now you can also turn few knobs on bosh health monitor

1. Increase agent poll timeout
<http://bosh.io/jobs/health_monitor?source=github.com/cloudfoundry/bosh&version=257.9#p=hm.intervals.agent_timeout>

2. Configure min jobs down
<http://bosh.io/jobs/health_monitor?source=github.com/cloudfoundry/bosh&version=257.9#p=hm.resurrector.minimum_down_jobs> along
with threshold
<http://bosh.io/jobs/health_monitor?source=github.com/cloudfoundry/bosh&version=257.9#p=hm.resurrector.percent_threshold> according
to size of your deployment.

Thanks
Ronak

On Tue, Sep 13, 2016 at 5:15 PM, Voelz, Marco <marco.voelz(a)sap.com> wrote:


Dear Stanley,

some days ago I opened a PR on bosh-agent [1] to send 2 heartbeats per
minute instead of only one. I agree that the resurrector kicking in after a
single heartbeat has been missed might be a bit problematic. Maybe that
already helps your case.

Warm regards
Marco

[1] https://github.com/cloudfoundry/bosh-agent/pull/95

-----Original Message-----
From: Stanley Shen <meteorping(a)gmail.com>
Reply-To: "Discussions about the Cloud Foundry BOSH project." <
cf-bosh(a)lists.cloudfoundry.org>
Date: Tuesday 13 September 2016 at 10:04
To: "cf-bosh(a)lists.cloudfoundry.org" <cf-bosh(a)lists.cloudfoundry.org>
Subject: [cf-bosh] BOSH recreates VM

Hello, all

For each bosh managed VM, there is a bosh agent running there.
If bosh cannot talk to bosh agent for some time, it considers the VM
is gone and will recreate the VM based on deployment files.

We are running a performance testing on a VM, and the VM has been
recreated several times during the testing because the processed used in
performance testing cost almost 100% CPU, so bosh cannot get the status
from bosh agent continuously.

The "recreation" is very helpful for bosh to manage the VM
automatically, especially the VM is really down.
But in my case it's not convenient because the VM is alive but just
very slow.

Is there a way to configure the interval for checking bosh agent, how
many times to try on it to make it more intelligent?
I mean how bosh handle this case if the VM is in heavy load?
Disable the "recreation" can be a solution but it's not good I think
in case the VM is really gone.

Any experience on this before?
Thanks in advance.




Ronak Banka
 

Hi Lukas ,

Not that one 😅
This one https://en.wikipedia.org/wiki/Control_knob

On Sun, Sep 18, 2016 at 9:31 PM, Lukas Lehner <weblehner(a)gmail.com> wrote:

what is a knob or knobs?

I think you don't mean that http://www.urbandictionary.com/define.
php?term=Knob

On Tue, Sep 13, 2016 at 2:10 PM, ronak banka <ronakbanka.cse(a)gmail.com>
wrote:

Stanley ,

For now you can also turn few knobs on bosh health monitor

1. Increase agent poll timeout
<http://bosh.io/jobs/health_monitor?source=github.com/cloudfoundry/bosh&version=257.9#p=hm.intervals.agent_timeout>

2. Configure min jobs down
<http://bosh.io/jobs/health_monitor?source=github.com/cloudfoundry/bosh&version=257.9#p=hm.resurrector.minimum_down_jobs> along
with threshold
<http://bosh.io/jobs/health_monitor?source=github.com/cloudfoundry/bosh&version=257.9#p=hm.resurrector.percent_threshold> according
to size of your deployment.

Thanks
Ronak

On Tue, Sep 13, 2016 at 5:15 PM, Voelz, Marco <marco.voelz(a)sap.com>
wrote:


Dear Stanley,

some days ago I opened a PR on bosh-agent [1] to send 2 heartbeats per
minute instead of only one. I agree that the resurrector kicking in after a
single heartbeat has been missed might be a bit problematic. Maybe that
already helps your case.

Warm regards
Marco

[1] https://github.com/cloudfoundry/bosh-agent/pull/95

-----Original Message-----
From: Stanley Shen <meteorping(a)gmail.com>
Reply-To: "Discussions about the Cloud Foundry BOSH project." <
cf-bosh(a)lists.cloudfoundry.org>
Date: Tuesday 13 September 2016 at 10:04
To: "cf-bosh(a)lists.cloudfoundry.org" <cf-bosh(a)lists.cloudfoundry.org>
Subject: [cf-bosh] BOSH recreates VM

Hello, all

For each bosh managed VM, there is a bosh agent running there.
If bosh cannot talk to bosh agent for some time, it considers the VM
is gone and will recreate the VM based on deployment files.

We are running a performance testing on a VM, and the VM has been
recreated several times during the testing because the processed used in
performance testing cost almost 100% CPU, so bosh cannot get the status
from bosh agent continuously.

The "recreation" is very helpful for bosh to manage the VM
automatically, especially the VM is really down.
But in my case it's not convenient because the VM is alive but just
very slow.

Is there a way to configure the interval for checking bosh agent,
how many times to try on it to make it more intelligent?
I mean how bosh handle this case if the VM is in heavy load?
Disable the "recreation" can be a solution but it's not good I think
in case the VM is really gone.

Any experience on this before?
Thanks in advance.




Stanley Shen <meteorping@...>
 

Thanks all for information.