Deploying BOSH : Spawning new deployment from child BOSH becomes unresponsive after some time


Subhankar Chattopadhyay <subho.atg@...>
 

Hi,

I have BOSH-Lite installed locally and I am trying to deploy BOSH on
BOSH-lite. This may sound confusing but I am just trying to create
another level of hierarchy of BOSH. The attached is the manifest that
I use and it deploys successfully.

Now I target to this new child BOSH director and I try to deploy a
sample release, for example, redis release. I am able to upload
stemcell and deploy the redis cluster successfully. But after some
minutes, the nodes of this deployment becomes unresponsive.

vcap(a)agent-id-bosh-0:~/i068838/microbosh$ bosh vms
Deployment `redis-warden'

Director task 27

Task 27 done

+-------------------+---------+---------------+------------+
| Job/index | State | Resource Pool | IPs |
+-------------------+---------+---------------+------------+
| redis_leader_z1/0 | running | small_z1 | 10.244.2.2 |
| redis_z1/0 | running | small_z1 | 10.244.1.2 |
| redis_z1/1 | running | small_z1 | 10.244.1.6 |
+-------------------+---------+---------------+------------+

VMs total: 3
vcap(a)agent-id-bosh-0:~/i068838/microbosh$ bosh vms
Deployment `redis-warden'

Director task 28

Task 28 done

+-------------------+--------------------+---------------+-----+
| Job/index | State | Resource Pool | IPs |
+-------------------+--------------------+---------------+-----+
| redis_leader_z1/0 | unresponsive agent | small_z1 | |
| redis_z1/0 | unresponsive agent | small_z1 | |
| redis_z1/1 | unresponsive agent | small_z1 | |
+-------------------+--------------------+---------------+-----+


I tried to search the logs and found this in the health monitor log of
the child bosh.

vi /var/vcap/sys/log/health_monitor/health_monitor.log

I, [2016-01-29T10:03:04.343261 #496] INFO : Analyzing agents...
I, [2016-01-29T10:03:04.343621 #496] INFO : Analyzed 0 agents, took
8.0871e-05 seconds
E, [2016-01-29T10:03:34.402357 #496] ERROR : Cannot get deployments
from director at https://10.244.9.2:25555/deployments: 401 Not
authorized: '/deployments'

E, [2016-01-29T10:03:34.402539 #496] ERROR :
/var/vcap/packages/health_monitor/gem_home/ruby/2.1.0/gems/bosh-monitor-1.3169.0/lib/bosh/monitor/director.rb:16:in
`get_deployments'
/var/vcap/packages/health_monitor/gem_home/ruby/2.1.0/gems/bosh-monitor-1.3169.0/lib/bosh/monitor/runner.rb:146:in
`fetch_deployments'
/var/vcap/packages/health_monitor/gem_home/ruby/2.1.0/gems/bosh-monitor-1.3169.0/lib/bosh/monitor/runner.rb:97:in
`block in poll_director'
I, [2016-01-29T10:03:34.402711 #496] INFO : [ALERT] Alert @
2016-01-29 10:03:34 UTC, severity 3: Cannot get deployments from
director at https://10.244.9.2:25555/deployments: 401 Not authorized:
'/deployments'
...................
................
I, [2016-01-29T11:41:28.543206 #26013] INFO : Found deployment `redis-warden'
I, [2016-01-29T11:41:28.587300 #26013] INFO : Adding agent
a613875d-cbd7-4450-bfde-39bdfe21f11f (redis_z1/0) to redis-warden...
I, [2016-01-29T11:41:28.587431 #26013] INFO : Adding agent
52bbce8a-fe26-47fc-9613-76a311949414 (redis_leader_z1/0) to
redis-warden...
I, [2016-01-29T11:41:28.587505 #26013] INFO : Adding agent
ee51f46c-9907-4293-b1cf-28f6be6ce87a (redis_z1/1) to redis-warden...
I, [2016-01-29T11:41:58.518624 #26013] INFO : Analyzing agents...
I, [2016-01-29T11:41:58.519463 #26013] INFO : Analyzed 3 agents, took
0.000134647 seconds
W, [2016-01-29T11:42:28.578004 #26013] WARN : Found stale deployment
redis-warden, removing...
I, [2016-01-29T11:42:36.454647 #26013] INFO : [ALERT] Alert @
2016-01-29 11:42:36 UTC, severity 4: Begin update deployment for
'redis-warden' against Director '4aa4c1d8-b5b1-4892-944d-d95d66f0529a'
W, [2016-01-29T11:42:36.454806 #26013] WARN : (Resurrector) event did
not have deployment, job and index: Alert @ 2016-01-29 11:42:36 UTC,
severity 4: Begin update deployment for 'redis-warden' against
Director '4aa4c1d8-b5b1-4892-944d-d95d66f0529a'
W, [2016-01-29T11:42:41.868459 #26013] WARN : Received heartbeat from
unmanaged agent: 0a509657-9f23-4f01-874e-5e98a53239e7
W, [2016-01-29T11:42:42.507345 #26013] WARN : Received heartbeat from
unmanaged agent: fbb147b8-ae89-47ba-ac50-5392a22930fd
W, [2016-01-29T11:42:42.521077 #26013] WARN : Received heartbeat from
unmanaged agent: 42243d80-2f80-47e3-9840-a58d66d0e784
I, [2016-01-29T11:42:51.304697 #26013] INFO : Agent
`42243d80-2f80-47e3-9840-a58d66d0e784' shutting down...
I, [2016-01-29T11:42:51.305346 #26013] INFO : Removing agent
42243d80-2f80-47e3-9840-a58d66d0e784 from all deployments...
I, [2016-01-29T11:42:58.520078 #26013] INFO : Analyzing agents...
W, [2016-01-29T11:42:58.520301 #26013] WARN : Agent
0a509657-9f23-4f01-874e-5e98a53239e7 is not a part of any deployment
W, [2016-01-29T11:42:58.520415 #26013] WARN : Agent
fbb147b8-ae89-47ba-ac50-5392a22930fd is not a part of any deployment
I, [2016-01-29T11:42:58.520508 #26013] INFO : Analyzed 2 agents, took
0.000273774 seconds
W, [2016-01-29T11:43:00.209452 #26013] WARN : Received alert from
unmanaged agent: 42243d80-2f80-47e3-9840-a58d66d0e784
I, [2016-01-29T11:43:00.209909 #26013] INFO : [ALERT] Alert @
2016-01-29 11:43:00 UTC, severity 1: process is not running
W, [2016-01-29T11:43:00.210027 #26013] WARN : (Resurrector) event did
not have deployment, job and index: Alert @ 2016-01-29 11:43:00 UTC,
severity 1: process is not running
I, [2016-01-29T11:43:07.190492 #26013] INFO : Agent
`0a509657-9f23-4f01-874e-5e98a53239e7' shutting down...
I, [2016-01-29T11:43:07.191313 #26013] INFO : Removing agent
0a509657-9f23-4f01-874e-5e98a53239e7 from all deployments...
I, [2016-01-29T11:43:07.222311 #26013] INFO : Agent
`fbb147b8-ae89-47ba-ac50-5392a22930fd' shutting down...
I, [2016-01-29T11:43:07.222733 #26013] INFO : Removing agent
fbb147b8-ae89-47ba-ac50-5392a22930fd from all deployments...
W, [2016-01-29T11:43:15.782356 #26013] WARN : Received alert from
unmanaged agent: fbb147b8-ae89-47ba-ac50-5392a22930fd
I, [2016-01-29T11:43:15.782627 #26013] INFO : [ALERT] Alert @
2016-01-29 11:43:15 UTC, severity 1: process is not running
W, [2016-01-29T11:43:15.782714 #26013] WARN : (Resurrector) event did
not have deployment, job and index: Alert @ 2016-01-29 11:43:15 UTC,
severity 1: process is not running
W, [2016-01-29T11:43:15.832729 #26013] WARN : Received alert from
unmanaged agent: 0a509657-9f23-4f01-874e-5e98a53239e7
I, [2016-01-29T11:43:15.833077 #26013] INFO : [ALERT] Alert @
2016-01-29 11:43:15 UTC, severity 1: process is not running
W, [2016-01-29T11:43:15.833184 #26013] WARN : (Resurrector) event did
not have deployment, job and index: Alert @ 2016-01-29 11:43:15 UTC,
severity 1: process is not running
I, [2016-01-29T11:43:21.924108 #26013] INFO : [ALERT] Alert @
2016-01-29 11:43:21 UTC, severity 4: Finish update deployment for
'redis-warden' against Director '4aa4c1d8-b5b1-4892-944d-d95d66f0529a'
W, [2016-01-29T11:43:21.924923 #26013] WARN : (Resurrector) event did
not have deployment, job and index: Alert @ 2016-01-29 11:43:21 UTC,
severity 4: Finish update deployment for 'redis-warden' against
Director '4aa4c1d8-b5b1-4892-944d-d95d66f0529a'


Looks like the health monitor is not working properly. Can someone
please help me on this ?


Regards,
Subhankar

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.