Re: Wiping out data for consul/etcd with bosh drain script?
Amit Kumar Gupta
Great question. I would also prefer disaster recovery to be possible via
"bosh stop" then "bosh start". This is almost possible in etcd, except for
the conditional you found. The reason for the conditional is that it
allows rolling a 1-node cluster without data loss. I'd like to entertain
the idea that etcd-release's SLA (currently embodied in the acceptance
should drop the requirement of maintaining data for a 1-node cluster roll.
That reduced SLA will probably be fine with the community, and the improved
disaster recovery experience would be worth the reduced SLA, but I haven't
consul-release is a little further removed from this possibility because
consul requires different orchestration logic, and currently the
implementation doesn't wipe out the data as aggressively. We have a story [
1 <https://www.pivotaltracker.com/story/show/120648349>] already to explore
whether we could do that without reducing SLA [2
By the way, have you tried this? If you have a 3 node cluster with etcd
(healthy), and bosh stop then bosh start, does the cluster recover? If you
have a 3 node cluster with data, and you do this, does the cluster recover
(with data loss, which is acceptable in this case)? Even more interesting
would be to see what happens if you have an actual out of sync cluster, and
try this. This would be helpful input to have before we would get a chance
to prioritize the implementation.
On Wed, Jun 15, 2016 at 10:00 PM, Tomoe Sugihara <tsugihara(a)pivotal.io>