Update: Locks & Service Discovery in CF Runtime


Evan Farrar <evanfarrar@...>
 

We decided to move off of Consul, but why? This is fair question, and I'm
sorry for a slow response. I hope to answer extensively and transparently
as the project lead of the Infrastructure team maintaining consul-release.

There is not a single, definitive reason, so I think it is best to provide
as much context as possible to understand the motivations. I have written a
document about how the Consul and Cloud Foundry integration has gone, and
the thought process involved in our decision to stop that integration in
the future. It includes as much raw data as I could find.

https://docs.google.com/document/d/1qdLNIWQQzluXw5rnc39raAYOnnSdDUjhUOrovUE0NJI/edit?usp=sharing

Please comment on the doc, reply on this thread, or discuss in
#infrastructure in slack[1] with your thoughts.

[1] http://slack.cloudfoundry.org/

On Sun, May 7, 2017 at 3:08 PM, Benjamin Gandon <benjamin(a)gandon.org> wrote:

The road off Consul looks like it is long but necessary. Consul looks like
a spof in CF, when you know how much the platform needs it, and when you
read that sometimes plain upgrades break it badly.

Plus, the myriad of logic in the confab wrapper around Consul is an
example of how much Consul is hard to manage and keep up properly.

Don't forget that recently PCF benefitted a CRE (SRE-tye) shared review
from Google.
Don't forget that we have converging evidences that let us think Google
stays away from etcd for their hosted K8s on GCP.

My guess is that internally, Google SREs might have evidences at scale
that systems like etcd or consul should be avoided, and this understanding
is being ported to CF through the CRE program.


Also, moving away from Consul is like choosing to build Diego instead of
building on top of K8s. Controlling the agenda is important. I mean not
being forced to run after a project that has its own. Ensuring which
value is put into the product, and that this value is consistent with the
rest of the platform, is also important.


These are just thoughts. I would love to read more precise info about the
Why, for this "away-from-Consul" move. Guys?


Le 26 avr. 2017 à 18:48, Voelz, Marco <marco.voelz(a)sap.com> a écrit :

Dear Luan,

Maybe that's a stupid question which has already been answered, but
doesn't consul release 0.8 address most of the criticism from the CF
community?

I see now big efforts on all sides (CF and BOSH teams) invested in
building our own solution to a problem which seems to be pretty generic.

Do we think that's something we should spend engineering resources on and
that others (e.g. HashiCorp in this case) cannot solve the problem to se be
our needs? At least to me it seems that they try to move in the right
direction.

Maybe my perspective on this is just too generic and I'm not deep enough
in the technical details.

What do you think?

Thanks and warm regards
Marco



On 24. Apr 2017, at 20:35, Luan Santos <lsantos(a)pivotal.io> wrote:

Hi all,

We have been working on the milestones proposed before in order to lessen
and remove our dependencies on Consul.

Please see the updated Locks & Service Discovery in CF Runtime
<https://docs.google.com/document/d/1zw2tQtpBqYol9usIuK_3VKmXHCMW6J9Dupzjr16J-TY/edit>
document for more details and discussion.

Thanks,

Luan
Software Engineer, Cloud Foundry @ Pivotal


Benjamin Gandon
 

The road off Consul looks like it is long but necessary. Consul looks like a spof in CF, when you know how much the platform needs it, and when you read that sometimes plain upgrades break it badly.

Plus, the myriad of logic in the confab wrapper around Consul is an example of how much Consul is hard to manage and keep up properly.

Don't forget that recently PCF benefitted a CRE (SRE-tye) shared review from Google.
Don't forget that we have converging evidences that let us think Google stays away from etcd for their hosted K8s on GCP.

My guess is that internally, Google SREs might have evidences at scale that systems like etcd or consul should be avoided, and this understanding is being ported to CF through the CRE program.


Also, moving away from Consul is like choosing to build Diego instead of building on top of K8s. Controlling the agenda is important. I mean not being forced to run after a project that has its own. Ensuring which value is put into the product, and that this value is consistent with the rest of the platform, is also important.


These are just thoughts. I would love to read more precise info about the Why, for this "away-from-Consul" move. Guys?

Le 26 avr. 2017 à 18:48, Voelz, Marco <marco.voelz(a)sap.com> a écrit :

Dear Luan,

Maybe that's a stupid question which has already been answered, but doesn't consul release 0.8 address most of the criticism from the CF community?

I see now big efforts on all sides (CF and BOSH teams) invested in building our own solution to a problem which seems to be pretty generic.

Do we think that's something we should spend engineering resources on and that others (e.g. HashiCorp in this case) cannot solve the problem to se be our needs? At least to me it seems that they try to move in the right direction.

Maybe my perspective on this is just too generic and I'm not deep enough in the technical details.

What do you think?

Thanks and warm regards
Marco



On 24. Apr 2017, at 20:35, Luan Santos <lsantos(a)pivotal.io> wrote:

Hi all,

We have been working on the milestones proposed before in order to lessen and remove our dependencies on Consul.

Please see the updated Locks & Service Discovery in CF Runtime document for more details and discussion.

Thanks,

Luan
Software Engineer, Cloud Foundry @ Pivotal


Marco Voelz
 

Dear Luan,

Maybe that's a stupid question which has already been answered, but doesn't consul release 0.8 address most of the criticism from the CF community?

I see now big efforts on all sides (CF and BOSH teams) invested in building our own solution to a problem which seems to be pretty generic.

Do we think that's something we should spend engineering resources on and that others (e.g. HashiCorp in this case) cannot solve the problem to se be our needs? At least to me it seems that they try to move in the right direction.

Maybe my perspective on this is just too generic and I'm not deep enough in the technical details.

What do you think?

Thanks and warm regards
Marco

On 24. Apr 2017, at 20:35, Luan Santos <lsantos(a)pivotal.io<mailto:lsantos(a)pivotal.io>> wrote:

Hi all,

We have been working on the milestones proposed before in order to lessen and remove our dependencies on Consul.

Please see the updated Locks & Service Discovery in CF Runtime<https://docs.google.com/document/d/1zw2tQtpBqYol9usIuK_3VKmXHCMW6J9Dupzjr16J-TY/edit> document for more details and discussion.

Thanks,

Luan
Software Engineer, Cloud Foundry @ Pivotal


Luan Santos
 

Hi all,

We have been working on the milestones proposed before in order to lessen
and remove our dependencies on Consul.

Please see the updated Locks & Service Discovery in CF Runtime
<https://docs.google.com/document/d/1zw2tQtpBqYol9usIuK_3VKmXHCMW6J9Dupzjr16J-TY/edit>
document for more details and discussion.

Thanks,

Luan
Software Engineer, Cloud Foundry @ Pivotal