Re: consul_z1/0 is failing after update

Sylvain Gibier


Ok - after doing the recovery scenario - the cluster was back, and finally
ping point the root cause.

The reason - on 2 DIEGO cells - the consul_agent (client) is running out of
disk space to write up the keys. Per my understanding is that cluster
server node will swap if any errors occurred from nodes during the

How can I have /var/vcap/store map on the ephemeral disk and not root
partition when not using persisted disk in diego deployment?


On Tue, Dec 20, 2016 at 8:39 AM, Etourneau Gwenn <gwenn.etourneau(a)>


You can check recovery scenario here


2016-12-20 16:12 GMT+09:00 Sylvain Gibier <sylvain(a)>:


Any hint on how to fix it ? From a network topology - nothing changed,
and I can't find anything usefull in consul documentation for reforming my
cluster. Currently the 2 second consul node server is experiencing the
issue, so running on one consul node (server and leader)...

From CF perspective - how can I reinitialize the consul cluster, and
impact on the other components - as I'm starting to see failing routing
requests at this stage.


On Tue, Dec 20, 2016 at 2:40 AM, Yitao Jiang <>

we once had the same issue which causing by network issue, the consul
server follower couldn't connect to the leader, but what difference is that
we are running on openstack.

On Tue, Dec 20, 2016 at 12:32 AM, Sylvain Gibier <
sylvain(a)> wrote:


Diego has been default in my CF installation (H/A over 3 AZ) - and
today, while trying a simple BOSH CF update of a stemcell - the consul_z1/0
keeps on "failing after update".

If I look in the log file - I can see the following:

++ logger -p -t vcap.consul-agent
++ tee -a /var/vcap/sys/log/consul_agent/consul_agent.stdout.log error
during start: 2/30 nodes reported failure
2016/12/19 14:49:50 [ERR] agent.client: Failed to decode response
header: EOF
2016/12/19 14:49:50 [ERR] agent.client: Failed to decode response
header: EOF
Also it seems that I have a bunch of errors:

2016/12/19 13:54:32 [INFO] consul: adding server consul-z3-0 (Addr: (DC: dc1)
2016/12/19 13:54:32 [INFO] consul: adding server consul-z2-0 (Addr: (DC: dc1)
2016/12/19 13:54:32 [ERR] agent: failed to sync remote state: No
cluster leader
2016/12/19 13:54:32 [INFO] agent: Joining cluster...
2016/12/19 13:54:32 [INFO] agent: (LAN) joining: []
2016/12/19 13:54:32 [INFO] agent: (LAN) joined: 3 Err: <nil>
2016/12/19 13:54:32 [INFO] agent: Join completed. Synced with 3
initial agents
2016/12/19 13:54:32 [WARN] raft: Failed to get previous log: 503710
log not found (last: 503708)
2016/12/19 13:54:32 [INFO] raft: Removed ourself, transitioning to
I can definitely confirm in my case - that consul_z3 is the Leader (via
consul info) in my current setup.

Any help/point on how to fix that ?

Releases: CF: v234, Diego: 0.1467.0




Join to automatically receive all group messages.