Re: 3 etcd nodes don't work well in single zone


Amit Kumar Gupta
 

Hi Tony,

The logs you've retrieved only go back to Jul 21, which I can't correlate
with the "?/2" issues you were seeing. If you could possibly record again
a bunch of occurrences of flapping between "2/2" and "?/2" for an app
(along with datetime stamps), and then immediately get logs from *all* the
HM and etcd nodes (`bosh logs` only gets logs from one node at a time), I
can try to dig in more. It's important to get the logs from the HM and
etcd VMs soon after recording the "?/2" events, otherwise BOSH may
rotate/archive the logs and then make them harder to obtain.

Best,
Amit

On Tue, Jul 21, 2015 at 4:53 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

You should definitely not run etcd with 2 instances. You can read more
about
recommended cluster sizes in the etcd docs:


https://github.com/coreos/etcd/blob/740187f199a12652ca1b7bddb7b3489160103d84/Documentation/admin_guide.md#fault-tolerance-table

I will look at the attached logs and get back to you, but wanted to make
sure to advise you to run either 1 or 3 nodes. With 2, you can wedge the
system, because it will need all nodes to be up to achieve quorum. If you
roll one of the two nodes, it will not be able to rejoin the cluster, and
the service will be stuck in an unavailable state.



-----
Amit, CF OSS Release Integration PM
Pivotal Software, Inc.
--
View this message in context:
http://cf-dev.70369.x6.nabble.com/cf-dev-3-etcd-nodes-don-t-work-well-in-single-zone-tp746p809.html
Sent from the CF Dev mailing list archive at Nabble.com.
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.