Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update


Yunata, Ricky <rickyy@...>
 

Thank you very much George & others who have helped me. Really appreciate it!

Ricky Yunata

Please consider the environment before printing this email

-----Original Message-----
From: George Dean [mailto:gdean(a)pivotal.io]
Sent: Thursday, 7 April 2016 2:46 AM
To: cf-dev(a)lists.cloudfoundry.org
Subject: [cf-dev] Re: Re: Re: Re: Re: Re: Re: Re: Re: Re: Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update

Hi Ricky,

Fair enough, that sounds like an alright plan for now. If you decide to reenable SSL in the future and run into these problems again, please don't hesitate to let us know and we can try to give you a hand.

Thanks,
George
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com


George Dean
 

Hi Ricky,

Fair enough, that sounds like an alright plan for now. If you decide to reenable SSL in the future and run into these problems again, please don't hesitate to let us know and we can try to give you a hand.

Thanks,
George


Yunata, Ricky <rickyy@...>
 

Hi Adrian,

I have tried it again however I still couldn't get it to work. I'm not sure what was wrong. I even re-generate all the certificates again.
Anyway, I set the "require ssl" parameter to false and it works. So, yeah definitely there's something wrong with the certificate.
I will try again to use the certificate, but for now I can run Diego successfully. Thanks a lot for your help and people at pivotal team.

Ricky Yunata

Please consider the environment before printing this email

-----Original Message-----
From: Adrian Zankich [mailto:azankich(a)pivotal.io]
Sent: Saturday, 2 April 2016 3:57 AM
To: cf-dev(a)lists.cloudfoundry.org
Subject: [cf-dev] Re: Re: Re: Re: Re: Re: Re: Re: Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update

Hi Ricky,

We deconstructed the certs you provided in your manifest and think that you may have missed a step when you generated your peer ssl cert. Your peer cert is missing the DNS wildcard entry '*.etcd.service.cf.internal`, it will show up like this if you deconstruct your cert

X509v3 Subject Alternative Name:
DNS:*.etcd.service.cf.internal, DNS:etcd.service.cf.internal

If you regenerate your peer ssl cert with:

$ certstrap --depot-path peer request-cert --common-name "etcd.service.cf.internal" --domain "*.etcd.service.cf.internal,etcd.service.cf.internal"

It is detailed in https://github.com/cloudfoundry-incubator/diego-release#generating-tls-certificates step #8.

That should fix the ssl error you're experiencing.

- Adrian
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com


Yunata, Ricky <rickyy@...>
 

Hi Adrian,

Thanks so much, I'll give it a try

Ricky Yunata

Please consider the environment before printing this email

-----Original Message-----
From: Adrian Zankich [mailto:azankich(a)pivotal.io]
Sent: Saturday, 2 April 2016 3:57 AM
To: cf-dev(a)lists.cloudfoundry.org
Subject: [cf-dev] Re: Re: Re: Re: Re: Re: Re: Re: Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update

Hi Ricky,

We deconstructed the certs you provided in your manifest and think that you may have missed a step when you generated your peer ssl cert. Your peer cert is missing the DNS wildcard entry '*.etcd.service.cf.internal`, it will show up like this if you deconstruct your cert

X509v3 Subject Alternative Name:
DNS:*.etcd.service.cf.internal, DNS:etcd.service.cf.internal

If you regenerate your peer ssl cert with:

$ certstrap --depot-path peer request-cert --common-name "etcd.service.cf.internal" --domain "*.etcd.service.cf.internal,etcd.service.cf.internal"

It is detailed in https://github.com/cloudfoundry-incubator/diego-release#generating-tls-certificates step #8.

That should fix the ssl error you're experiencing.

- Adrian
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com


Adrian Zankich
 

Hi Ricky,

We deconstructed the certs you provided in your manifest and think that you may have missed a step when you generated your peer ssl cert. Your peer cert is missing the DNS wildcard entry '*.etcd.service.cf.internal`, it will show up like this if you deconstruct your cert

X509v3 Subject Alternative Name:
DNS:*.etcd.service.cf.internal, DNS:etcd.service.cf.internal

If you regenerate your peer ssl cert with:

$ certstrap --depot-path peer request-cert --common-name "etcd.service.cf.internal" --domain "*.etcd.service.cf.internal,etcd.service.cf.internal"

It is detailed in https://github.com/cloudfoundry-incubator/diego-release#generating-tls-certificates step #8.

That should fix the ssl error you're experiencing.

- Adrian


Yunata, Ricky <rickyy@...>
 

Hi Ryan,

Thanks for your e-mail. I'm following the instructions from https://github.com/cloudfoundry-incubator/diego-release
This is how I generate my certificates

CA certificate
$ certstrap init --common-name "diegoCA"

Etcd server certificate
$ certstrap request-cert --common-name "etcd.service.cf.internal" --domain "*.etcd.service.cf.internal,etcd.service.cf.internal"
$ certstrap sign etcd.service.cf.internal --CA diegoCA

Etcd client certificate
$ certstrap request-cert --common-name "client"
$ certstrap sign client --CA diegoCA

Ricky Yunata


Please consider the environment before printing this email

-----Original Message-----
From: Ryan Moran [mailto:rmoran(a)pivotal.io]
Sent: Thursday, 31 March 2016 9:01 AM
To: cf-dev(a)lists.cloudfoundry.org
Subject: [cf-dev] Re: Re: Re: Re: Re: Re: Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update

Hi Ricky,

We think there might be an issue with the TLS certs that you provide in the etcd properties of the manifest. Can you tell us how you generated these certs? We also discovered an issue with the current behavior of the etcd_ctl start script. We will fix the bug, but we want to also help you get a working cluster.

Thanks,
Ryan
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com


Ryan Moran
 

Hi Ricky,

We think there might be an issue with the TLS certs that you provide in the etcd properties of the manifest. Can you tell us how you generated these certs? We also discovered an issue with the current behavior of the etcd_ctl start script. We will fix the bug, but we want to also help you get a working cluster.

Thanks,
Ryan


Yunata, Ricky <rickyy@...>
 

Hi Adrian,

Thanks for your reply. The log file for the database is too big to be attached by e-mail, so I have uploaded to dropbox.
You can access it here:
https://www.dropbox.com/sh/kfuc0uxyxsvb551/AACxn1Ie2VeF_zp_cpJJL-uWa?dl=0

Ricky Yunata
Software & Solution Specialist

Fujitsu Australia Software Technology Pty Ltd
14 Rodborough Road, Frenchs Forest NSW 2086, Australia
T +61 2 9452 9128 M +61 433 977 739 F +61 2 9975 2899
rickyy(a)fast.au.fujitsu.com
fastware.com.au

Please consider the environment before printing this email

-----Original Message-----
From: Adrian Zankich [mailto:azankich(a)pivotal.io]
Sent: Wednesday, 30 March 2016 3:39 AM
To: cf-dev(a)lists.cloudfoundry.org
Subject: [cf-dev] Re: Re: Re: Re: Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update

Hi Ricky,

Thanks for the clarification, if you can give us the logs for all three etcd instances, we can help debug exactly whats going on. You can retrieve the logs from the etcd instances by running:
`bosh logs database_z1 0 && bosh logs database_z2 0 && bosh logs database_z3 0`.

Thanks,

Adrian
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com


Adrian Zankich
 

Hi Ricky,

Thanks for the clarification, if you can give us the logs for all three etcd instances, we can help debug exactly whats going on. You can retrieve the logs from the etcd instances by running:
`bosh logs database_z1 0 && bosh logs database_z2 0 && bosh logs database_z3 0`.

Thanks,

Adrian


Yunata, Ricky <rickyy@...>
 

Hi Adrian,

Thanks for your comment. I do have consul server in my cf-release deployment.
Currently it’s database_z2/0 that is failing, however if I stop all running etcds on database_z1 and database_z2 and then start it first on database_z2, it works. On the other hand, after the etcd in database_z2 works, the etcd in database_z1 wouldn’t start, so it seems that only 1 etcd can be run.

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM | State | AZ | VM Type | IPs |
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_z1/0 (aef6e8d4-e088-420c-89f8-c74c4be0f3c6) | running | n/a | large_z1 | 192.168.1.6 |
| consul_z1/0 (8b1972db-1a24-414a-a40b-924f0d880fda) | running | n/a | small_z1 | 192.168.1.22 |
| doppler_z1/0 (1f044051-be96-4235-bd74-21093156136e) | running | n/a | medium_z1 | 192.168.1.31 |
| etcd_z1/0 (fde0895e-39dc-4826-8f29-be6ef5bb9ee5) | running | n/a | medium_z1 | 192.168.1.18 |
| ha_proxy_z1/0 (20d2a2b4-4c57-4fc4-8f99-0a365fbc8246) | running | n/a | router_z1 | 192.168.1.10 |
| | | | | 137.172.74.81 |
| hm9000_z1/0 (0f124588-7dd0-4351-b606-42630e8bc300) | running | n/a | medium_z1 | 192.168.1.7 |
| loggregator_trafficcontroller_z1/0 (e904e07f-2877-421d-8336-65f8422c4592) | running | n/a | small_z1 | 192.168.1.32 |
| loggregator_z1/0 (28a5d336-9f5f-45a8-b427-73be37a1f37d) | running | n/a | medium_z1 | 192.168.1.9 |
| nats_z1/0 (6a56ebca-a1bb-4192-beb7-86f4ac11b3ca) | running | n/a | medium_z1 | 192.168.1.12 |
| nfs_z1/0 (876a10c2-212e-49a5-913e-2fcce0c215a6) | running | n/a | medium_z1 | 192.168.1.13 |
| router_z1/0 (1fafd912-7357-4d12-8bbd-fedc10f47d40) | running | n/a | router_z1 | 192.168.1.15 |
| runner_z1/0 (d80862bb-f3b9-43da-ab55-fc0af3f5b569) | running | n/a | runner_z1 | 192.168.1.8 |
| stats_z1/0 (2ab128db-8efc-4c2b-98fb-673b0ebaaba4) | running | n/a | small_z1 | 192.168.1.4 |
| uaa_z1/0 (76800329-ad05-442a-a18d-79cb98abec27) | running | n/a | medium_z1 | 192.168.1.5 |
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

+-----------------------------------------------------------+---------+-----+------------------+--------------+
| VM | State | AZ | VM Type | IPs |
+-----------------------------------------------------------+---------+-----+------------------+--------------+
| access_z1/0 (598f16db-60c2-4c13-bcec-85ae2a38102d) | running | n/a | access_z1 | 192.168.3.44 |
| access_z2/0 (a83d049d-6c95-417e-84f4-9aced8a9136f) | running | n/a | access_z2 | 192.168.4.56 |
| brain_z1/0 (a95c56bb-a84d-41b4-91b1-ade57c773dbe) | running | n/a | brain_z1 | 192.168.3.40 |
| brain_z2/0 (eb386b16-c8e4-4c04-9582-20f4161f6e03) | running | n/a | brain_z2 | 192.168.4.52 |
| cc_bridge_z1/0 (b9870145-26d7-4e59-9358-97c43db6a110) | running | n/a | cc_bridge_z1 | 192.168.3.42 |
| cc_bridge_z2/0 (7477b06f-e501-4757-abda-8e29c7c15464) | running | n/a | cc_bridge_z2 | 192.168.4.54 |
| cell_z1/0 (a6ef0a8c-52c0-4bd2-abfb-2fcf0101dd24) | running | n/a | cell_z1 | 192.168.3.41 |
| cell_z2/0 (36f012e3-2013-44aa-9a92-18161d6854ad) | running | n/a | cell_z2 | 192.168.4.53 |
| database_z1/0 (5428cca8-9832-42f4-9b3a-a822eb6d7e96) | running | n/a | database_z1 | 192.168.3.39 |
| database_z2/0 (16c88d30-fe70-4d42-8307-34cc85521ca7) | failing | n/a | database_z2 | 192.168.4.51 |
| database_z3/0 (c802162f-0681-479e-bb9c-98dac7d78941) | running | n/a | database_z3 | 192.168.5.31 |
| route_emitter_z1/0 (f7f7a8f3-9784-4b99-b0a5-6efb4d193cf5) | running | n/a | route_emitter_z1 | 192.168.3.43 |
| route_emitter_z2/0 (7f4e7fb7-7986-432e-a2e3-b298d3070753) | running | n/a | route_emitter_z2 | 192.168.4.55 |
+-----------------------------------------------------------+---------+-----+------------------+--------------+

Regards,
Ricky



From: Amit Gupta [mailto:agupta(a)pivotal.io]
Sent: Tuesday, 29 March 2016 4:30 AM
To: Discussions about Cloud Foundry projects and the system overall.
Subject: [cf-dev] Re: Re: Failed to deploy diego 0.1452.0 on openstack: database_z2/0 is not running after update

The consul server cluster will be part of the cf-release deployment. This Diego deployment will be talking to the consul server cluster in the cf-release deployment.

On Mon, Mar 28, 2016 at 10:20 AM, Adrian Zankich <azankich(a)pivotal.io<mailto:azankich(a)pivotal.io>> wrote:
Hello Ricky,

I see that you're trying to run etcd in SSL mode, but I do not see a consul server instance in your instance list. Are you deploying a consul server job?

- Adrian

Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com


Amit Kumar Gupta
 

The consul server cluster will be part of the cf-release deployment. This
Diego deployment will be talking to the consul server cluster in the
cf-release deployment.

On Mon, Mar 28, 2016 at 10:20 AM, Adrian Zankich <azankich(a)pivotal.io>
wrote:

Hello Ricky,

I see that you're trying to run etcd in SSL mode, but I do not see a
consul server instance in your instance list. Are you deploying a consul
server job?

- Adrian


Adrian Zankich
 

Hello Ricky,

I see that you're trying to run etcd in SSL mode, but I do not see a consul server instance in your instance list. Are you deploying a consul server job?

- Adrian


Yunata, Ricky <rickyy@...>
 

Hi,

I'm currently deploying diego on my openstack environment, however I got an error when it was updating database_z2
Below is the error message from debug.log
<Bosh::Director::AgentJobNotRunning: `database_z2/0 (16c88d30-fe70-4d42-8307-34cc85521ca7)' is not running after update. Review logs for failed jobs: etcd>

My environment are:
Stemcell : Ubuntu-trusty Version 3192
CF Release : Version 230
Diego : Version 0.1452.0
Etcd : Version 38
Garden-linux : Version 0.334.0

I'm experiencing similar error as this, however the solution didn't work for me.
https://github.com/cloudfoundry-incubator/diego-release/issues/119

This is what I'm seeing on my error log

Monit summary
Process 'etcd' not monitored
Process 'bbs' running
Process 'consul_agent' running
Process 'metron_agent' running
System 'system_localhost' running

etcd_ctl.err.log
[2016-03-23 01:22:33+0000] + /var/vcap/packages/etcd/etcdctl -ca-file=/var/vcap/jobs/etcd/config/certs/server-ca.crt -cert-file=/var/vcap/jobs/etcd/config/certs/client.crt -key-file=/var/vcap/jobs/etcd/config/certs/client.key -C https://database-z2-0.etcd.service.cf.internal:4001 ls
[2016-03-23 01:22:33+0000] Error: cannot sync with the cluster using endpoints https://database-z2-0.etcd.service.cf.internal:4001

etcd.stderr.log
2016/03/23 00:56:52 etcdmain: couldn't find local name "database-z2-0" in the initial cluster configuration

consul_agent.stdout.log
2016/03/23 01:23:26 [WARN] agent: Check 'service:etcd' is now critical
2016/03/23 01:23:29 [WARN] agent: Check 'service:etcd' is now critical
2016/03/23 01:23:32 [WARN] agent: Check 'service:etcd' is now critical
2016/03/23 01:23:35 [WARN] agent: Check 'service:etcd' is now critical
2016/03/23 01:23:38 [WARN] agent: Check 'service:etcd' is now critical
2016/03/23 01:23:41 [WARN] agent: Check 'service:etcd' is now critical
2016/03/23 01:23:41 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd'
2016/03/23 01:23:41 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd'
2016/03/23 01:23:42 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd'
2016/03/23 01:23:42 [WARN] dns: node 'database-z2-0' failing health check 'service:etcd: Service 'etcd' check', dropping from service 'etcd'


This is when I run bosh instances --ps
+------------------------------------------------------------+---------+-----+------------------+--------------+
| Instance | State | AZ | VM Type | IPs |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| access_z1/0 (598f16db-60c2-4c13-bcec-85ae2a38102d)* | running | n/a | access_z1 | 192.168.3.44 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| access_z2/0 (a83d049d-6c95-417e-84f4-9aced8a9136f)* | running | n/a | access_z2 | 192.168.4.56 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| brain_z1/0 (a95c56bb-a84d-41b4-91b1-ade57c773dbe)* | running | n/a | brain_z1 | 192.168.3.40 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| brain_z2/0 (eb386b16-c8e4-4c04-9582-20f4161f6e03)* | running | n/a | brain_z2 | 192.168.4.52 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| cc_bridge_z1/0 (b9870145-26d7-4e59-9358-97c43db6a110)* | running | n/a | cc_bridge_z1 | 192.168.3.42 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| cc_bridge_z2/0 (7477b06f-e501-4757-abda-8e29c7c15464)* | running | n/a | cc_bridge_z2 | 192.168.4.54 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| cell_z1/0 (a6ef0a8c-52c0-4bd2-abfb-2fcf0101dd24)* | running | n/a | cell_z1 | 192.168.3.41 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| cell_z2/0 (36f012e3-2013-44aa-9a92-18161d6854ad)* | running | n/a | cell_z2 | 192.168.4.53 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| database_z1/0 (5428cca8-9832-42f4-9b3a-a822eb6d7e96)* | running | n/a | database_z1 | 192.168.3.39 |
| etcd | running | | | |
| bbs | running | | | |
| consul_agent | running | | | |
| metron_agent | running | | | |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| database_z2/0 (16c88d30-fe70-4d42-8307-34cc85521ca7)* | failing | n/a | database_z2 | 192.168.4.51 |
| etcd | unknown | | | |
| bbs | running | | | |
| consul_agent | running | | | |
| metron_agent | running | | | |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| database_z3/0 (c802162f-0681-479e-bb9c-98dac7d78941)* | running | n/a | database_z3 | 192.168.5.31 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| route_emitter_z1/0 (f7f7a8f3-9784-4b99-b0a5-6efb4d193cf5)* | running | n/a | route_emitter_z1 | 192.168.3.43 |
+------------------------------------------------------------+---------+-----+------------------+--------------+
| route_emitter_z2/0 (7f4e7fb7-7986-432e-a2e3-b298d3070753)* | running | n/a | route_emitter_z2 | 192.168.4.55 |
+------------------------------------------------------------+---------+-----+------------------+--------------+

I tried to stop all running etcds on database_z1 and database_z2, then `rm -rf /var/vcap/store/etcd/*` on both of the VMs and monit start the etcd process again. It seems that only 1 etcd service can be run. If I monit start etcd on the database_z2 first before database_z1, database_z2 will be running, instead database_z1 will fail. But, if I do it on database_z1 first before database_z2, then database_z1 will be running and database_z2 will fail.

Anyone has an idea on how to solve this? Thanks

Regards
Ricky
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com