Re: consul_z1/0 is failing after update


Sylvain Gibier
 

Hi,

Any hint on how to fix it ? From a network topology - nothing changed, and
I can't find anything usefull in consul documentation for reforming my
cluster. Currently the 2 second consul node server is experiencing the
issue, so running on one consul node (server and leader)...

From CF perspective - how can I reinitialize the consul cluster, and impact
on the other components - as I'm starting to see failing routing requests
at this stage.

Sylvain

On Tue, Dec 20, 2016 at 2:40 AM, Yitao Jiang <jiangyt.cn(a)gmail.com> wrote:

we once had the same issue which causing by network issue, the consul
server follower couldn't connect to the leader, but what difference is that
we are running on openstack.

On Tue, Dec 20, 2016 at 12:32 AM, Sylvain Gibier <
sylvain(a)munichconsulting.de> wrote:

Hi,

Diego has been default in my CF installation (H/A over 3 AZ) - and today,
while trying a simple BOSH CF update of a stemcell - the consul_z1/0 keeps
on "failing after update".

If I look in the log file - I can see the following:

"
++ logger -p user.info -t vcap.consul-agent
++ tee -a /var/vcap/sys/log/consul_agent/consul_agent.stdout.log error
during start: 2/30 nodes reported failure
2016/12/19 14:49:50 [ERR] agent.client: Failed to decode response header:
EOF
2016/12/19 14:49:50 [ERR] agent.client: Failed to decode response header:
EOF
"
Also it seems that I have a bunch of errors:

"
2016/12/19 13:54:32 [INFO] consul: adding server consul-z3-0 (Addr:
10.10.30.37:8300) (DC: dc1)
2016/12/19 13:54:32 [INFO] consul: adding server consul-z2-0 (Addr:
10.10.20.37:8300) (DC: dc1)
2016/12/19 13:54:32 [ERR] agent: failed to sync remote state: No
cluster leader
2016/12/19 13:54:32 [INFO] agent: Joining cluster...
2016/12/19 13:54:32 [INFO] agent: (LAN) joining: [10.10.10.37
10.10.20.37 10.10.30.37]
2016/12/19 13:54:32 [INFO] agent: (LAN) joined: 3 Err: <nil>
2016/12/19 13:54:32 [INFO] agent: Join completed. Synced with 3
initial agents
2016/12/19 13:54:32 [WARN] raft: Failed to get previous log: 503710
log not found (last: 503708)
2016/12/19 13:54:32 [INFO] raft: Removed ourself, transitioning to
follower
"
I can definitely confirm in my case - that consul_z3 is the Leader (via
consul info) in my current setup.

Any help/point on how to fix that ?


Releases: CF: v234, Diego: 0.1467.0
IaaS: AWS



--

Regards,

Yitao

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.