toggle quoted messageShow quoted text
Le 11 avr. 2016 à 22:40, Amit Gupta <agupta(a)pivotal.io> a écrit :
Orchestrating a raft cluster in a way that requires no manual intervention is incredibly difficult. We write the PID file late for a specific reason:
For dealing with wedged states like the one you encountered, we have some recommendations in the documentation:
We have acceptance tests we run in CI that exercise rolling a 3 node cluster, so if you hit a failure it would be useful to get logs if you have any.
On Mon, Apr 11, 2016 at 9:38 AM, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> wrote:
Actually, doing some further tests, I realize a mere 'join' is definitely not enough.
Instead, you need to restore the raft/peers.json on each one of the 3 consul server nodes:
monit stop consul_agent
echo '["10.244.0.58:8300 <http://10.244.0.58:8300/>","10.244.2.54:8300 <http://10.244.2.54:8300/>","10.244.0.54:8300 <http://10.244.0.54:8300/>"]' > /var/vcap/store/consul_agent/raft/peers.json
And make sure you start them quite at the same time with “monit start consul_agent”
So this advocates a strongly for setting skip_leave_on_interrupt=true and leave_on_terminate=false in confab, because loosing the peers.json is really something we don't want in our CF deployments!
Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> a écrit :
Hi cf devs,
I’m running a CF deployment with redundancy, and I just experienced my consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment down, until you get a deeper understanding of consul, and figure out they just need a silly manual 'join' so that they get back together.
But that was definitely not easy to nail down because at first look, I could just see monit restarting the “agent_ctl” every 60 seconds because confab was not writing the damn PID file.
More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and consul_z2/0) had properly left oneanother uppon a graceful shutdown. This state was persisted in /var/vcap/store/raft/peers.json being “null” on each one of them, so they would not get back together on restart. A manual 'join' was necessary. But it took me hours to get there because I’m no expert with consul.
And until the 'join' is made, VerifySynced() was negative in confab, and monit was constantly starting and stopping it every 60 seconds. But once you step back, you realize confab was actually waiting for the new leader to be elected before it writes the PID file. Which is questionable.
So, I’m asking 3 questions here:
1. Does writing the PID file in confab that late really makes sense?
2. Could someone please write some minimal documentation about confab, at least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not here, then the cluster gets unhealthy?
With this 3rd question, I mean that even on a graceful TERM or INT, no consul server should not perform any graceful 'leave'. With this different approach, then they would properly be back up even when performing a complete graceful restart of the cluster.
This can be done with those extra configs from the “confab” wrapper:
What do you guys think of it?