Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update
Yunata, Ricky <rickyy@...>
Hi,
I'm currently deploying diego on my openstack environment, however I got an error when it was updating database_z2 Below is the error message from debug.log <Bosh::Director::AgentJobNotRunning: `database_z2/0 (16c88d30-fe70-4d42-8307-34cc85521ca7)' is not running after update. Review logs for failed jobs: etcd> My environment are: Stemcell : Ubuntu-trusty Version 3192 CF Release : Version 230 Diego : Version 0.1452.0 Etcd : Version 38 Garden-linux : Version 0.334.0 I'm experiencing similar error as this, however the solution didn't work for me. https://github.com/cloudfoundry-incubator/diego-release/issues/119 This is what I'm seeing on my error log Monit summary Process 'etcd' not monitored Process 'bbs' running Process 'consul_agent' running Process 'metron_agent' running System 'system_localhost' running etcd_ctl.err.log [2016-03-23 01:22:33+0000] + /var/vcap/packages/etcd/etcdctl -ca-file=/var/vcap/jobs/etcd/config/certs/server-ca.crt -cert-file=/var/vcap/jobs/etcd/config/certs/client.crt -key-file=/var/vcap/jobs/etcd/config/certs/client.key -C https://database-z2-0.etcd.service.cf.internal:4001 ls [2016-03-23 01:22:33+0000] Error: cannot sync with the cluster using endpoints https://database-z2-0.etcd.service.cf.internal:4001 etcd.stderr.log 2016/03/23 00:56:52 etcdmain: couldn't find local name "database-z2-0" in the initial cluster configuration consul_agent.stdout.log 2016/03/23 01:23:26 [WARN] agent: Check 'service:etcd' is now critical 2016/03/23 01:23:29 [WARN] agent: Check 'service:etcd' is now critical 2016/03/23 01:23:32 [WARN] agent: Check 'service:etcd' is now critical 2016/03/23 01:23:35 [WARN] agent: Check 'service:etcd' is now critical 2016/03/23 01:23:38 [WARN] agent: Check 'service:etcd' is now critical 2016/03/23 01:23:41 [WARN] agent: Check 'service:etcd' is now critical 2016/03/23 01:23:41 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd' 2016/03/23 01:23:41 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd' 2016/03/23 01:23:42 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd' 2016/03/23 01:23:42 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd' This is when I run bosh instances --ps +------------------------------------------------------------+---------+-----+------------------+--------------+ | Instance | State | AZ | VM Type | IPs | +------------------------------------------------------------+---------+-----+------------------+--------------+ | access_z1/0 (598f16db-60c2-4c13-bcec-85ae2a38102d)* | running | n/a | access_z1 | 192.168.3.44 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | access_z2/0 (a83d049d-6c95-417e-84f4-9aced8a9136f)* | running | n/a | access_z2 | 192.168.4.56 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | brain_z1/0 (a95c56bb-a84d-41b4-91b1-ade57c773dbe)* | running | n/a | brain_z1 | 192.168.3.40 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | brain_z2/0 (eb386b16-c8e4-4c04-9582-20f4161f6e03)* | running | n/a | brain_z2 | 192.168.4.52 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | cc_bridge_z1/0 (b9870145-26d7-4e59-9358-97c43db6a110)* | running | n/a | cc_bridge_z1 | 192.168.3.42 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | cc_bridge_z2/0 (7477b06f-e501-4757-abda-8e29c7c15464)* | running | n/a | cc_bridge_z2 | 192.168.4.54 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | cell_z1/0 (a6ef0a8c-52c0-4bd2-abfb-2fcf0101dd24)* | running | n/a | cell_z1 | 192.168.3.41 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | cell_z2/0 (36f012e3-2013-44aa-9a92-18161d6854ad)* | running | n/a | cell_z2 | 192.168.4.53 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | database_z1/0 (5428cca8-9832-42f4-9b3a-a822eb6d7e96)* | running | n/a | database_z1 | 192.168.3.39 | | etcd | running | | | | | bbs | running | | | | | consul_agent | running | | | | | metron_agent | running | | | | +------------------------------------------------------------+---------+-----+------------------+--------------+ | database_z2/0 (16c88d30-fe70-4d42-8307-34cc85521ca7)* | failing | n/a | database_z2 | 192.168.4.51 | | etcd | unknown | | | | | bbs | running | | | | | consul_agent | running | | | | | metron_agent | running | | | | +------------------------------------------------------------+---------+-----+------------------+--------------+ | database_z3/0 (c802162f-0681-479e-bb9c-98dac7d78941)* | running | n/a | database_z3 | 192.168.5.31 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | route_emitter_z1/0 (f7f7a8f3-9784-4b99-b0a5-6efb4d193cf5)* | running | n/a | route_emitter_z1 | 192.168.3.43 | +------------------------------------------------------------+---------+-----+------------------+--------------+ | route_emitter_z2/0 (7f4e7fb7-7986-432e-a2e3-b298d3070753)* | running | n/a | route_emitter_z2 | 192.168.4.55 | +------------------------------------------------------------+---------+-----+------------------+--------------+ I tried to stop all running etcds on database_z1 and database_z2, then `rm -rf /var/vcap/store/etcd/*` on both of the VMs and monit start the etcd process again. It seems that only 1 etcd service can be run. If I monit start etcd on the database_z2 first before database_z1, database_z2 will be running, instead database_z1 will fail. But, if I do it on database_z1 first before database_z2, then database_z1 will be running and database_z2 will fail. Anyone has an idea on how to solve this? Thanks Regards Ricky Disclaimer The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof. Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached. If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com |
|