Orchestrating a raft cluster in a way that requires no manual intervention
is incredibly difficult. We write the PID file late for a specific reason:https://www.pivotaltracker.com/story/show/112018069
For dealing with wedged states like the one you encountered, we have some
recommendations in the documentation:https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery
We have acceptance tests we run in CI that exercise rolling a 3 node
cluster, so if you hit a failure it would be useful to get logs if you have
On Mon, Apr 11, 2016 at 9:38 AM, Benjamin Gandon <benjamin(a)gandon.org>
Actually, doing some further tests, I realize a mere 'join' is definitely
Instead, you need to restore the raft/peers.json on each one of the 3
consul server nodes:
monit stop consul_agent
echo '["10.244.0.58:8300","10.244.2.54:8300","10.244.0.54:8300"]' >
And make sure you start them quite at the same time with “monit start
So this advocates a strongly for setting *skip_leave_on_interrupt=true*
and *leave_on_terminate=false* in confab, because loosing the peers.json
is really something we don't want in our CF deployments!
Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org> a écrit :
Hi cf devs,
I’m running a CF deployment with redundancy, and I just experienced my
consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment
down, until you get a deeper understanding of consul, and figure out they
just need a silly manual 'join' so that they get back together.
But that was definitely not easy to nail down because at first look, I
could just see monit restarting the “agent_ctl” every 60 seconds because
confab was not writing the damn PID file.
More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and
consul_z2/0) had properly left oneanother uppon a graceful shutdown. This
state was persisted in /var/vcap/store/raft/peers.json being “null” on each
one of them, so they would not get back together on restart. A manual
'join' was necessary. But it took me hours to get there because I’m no
expert with consul.
And until the 'join' is made, VerifySynced() was negative in confab, and
monit was constantly starting and stopping it every 60 seconds. But once
you step back, you realize confab was actually waiting for the new leader
to be elected before it writes the PID file. Which is questionable.
So, I’m asking 3 questions here:
1. Does writing the PID file in confab *that* late really makes sense?
2. Could someone please write some minimal documentation about confab, at
least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not
here, then the cluster gets unhealthy?
With this 3rd question, I mean that even on a graceful TERM or INT, no
consul server should not perform any graceful 'leave'. With this different
approach, then they would properly be back up even when performing a
complete graceful restart of the cluster.
This can be done with those extra configs from the “confab” wrapper:
What do you guys think of it?