Topics

Scaling Down etcd


Suren R
 

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The migration idea
is to bring 4 new etcd machines in the cluster through new deployment.
Point all other components to these four etcd machines and delete the
existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a 4
node cluster, the cluster breaks and as a result all running apps are going
down. (Because the canary job brings one node down for the update, as a
result tolerance is breached.)

We also tried to remove these three nodes from the cluster using etcdctl
command and tried to update deletion to the new deployment through bosh.
This also makes the bosh deployment to fail (etcd job is failing saying
"unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in the
etcd cluster?

regards,
Surendhar


Amit Kumar Gupta
 

Hi Surendhar,

May I ask why you want to split the deployment into multiple deployments?
What problem are you having that you're trying to solve by doing this?

Best,
Amit

On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The migration
idea is to bring 4 new etcd machines in the cluster through new deployment.
Point all other components to these four etcd machines and delete the
existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a 4
node cluster, the cluster breaks and as a result all running apps are going
down. (Because the canary job brings one node down for the update, as a
result tolerance is breached.)

We also tried to remove these three nodes from the cluster using etcdctl
command and tried to update deletion to the new deployment through bosh.
This also makes the bosh deployment to fail (etcd job is failing saying
"unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in the
etcd cluster?

regards,
Surendhar



Lingesh Mouleeshwaran
 

Hi Amit
The main advantage that we are targeting is to reduce deployment time for
any changes in the cloud foundry. The advantages include but not limited to
* Target specific components for changes
* Deployment time
* Addressing specific components for patch updates
* Easier deployment
* Easier maintenance etc

Regards
Lingesh M

On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Surendhar,

May I ask why you want to split the deployment into multiple deployments?
What problem are you having that you're trying to solve by doing this?

Best,
Amit

On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The migration
idea is to bring 4 new etcd machines in the cluster through new deployment.
Point all other components to these four etcd machines and delete the
existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a 4
node cluster, the cluster breaks and as a result all running apps are going
down. (Because the canary job brings one node down for the update, as a
result tolerance is breached.)

We also tried to remove these three nodes from the cluster using etcdctl
command and tried to update deletion to the new deployment through bosh.
This also makes the bosh deployment to fail (etcd job is failing saying
"unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in the
etcd cluster?

regards,
Surendhar



Amit Kumar Gupta
 

Hi Lingesh,

I don't think easier deployment and maintenance is that simple. Each
manifest may become smaller, but now you have to maintain multiple small
manifests. And keep them in sync. And make sure that they are all
compatible. There are pros and cons to any sort of decomposition like this.

With regards to targetting specific components for change, I think what
will really solve your problem is having a single CF deployment composed of
multiple releases. E.g. uaa as its own separate release within a single CF
deployment. If you wanted to, you could update the uaa release itself
instead of having to update all the jobs. You still have the problem of,
if you only update one component, how you know it's compatible with all the
things you don't upgrade, but it sounds like you're already willing to take
on that complexity.

This decomposition of cf-release into multiple releases (composed into a
single deployment) is currently underway.

With regards to scaling down etcd, I wasn't able to understand the problem
you're hitting. Can you provide more details about exactly what you did,
in what order?

Best,
Amit

On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran <
lingeshmouleeshwaran(a)gmail.com> wrote:

Hi Amit
The main advantage that we are targeting is to reduce deployment time for
any changes in the cloud foundry. The advantages include but not limited to
* Target specific components for changes
* Deployment time
* Addressing specific components for patch updates
* Easier deployment
* Easier maintenance etc

Regards
Lingesh M

On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Surendhar,

May I ask why you want to split the deployment into multiple deployments?
What problem are you having that you're trying to solve by doing this?

Best,
Amit

On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The migration
idea is to bring 4 new etcd machines in the cluster through new deployment.
Point all other components to these four etcd machines and delete the
existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a 4
node cluster, the cluster breaks and as a result all running apps are going
down. (Because the canary job brings one node down for the update, as a
result tolerance is breached.)

We also tried to remove these three nodes from the cluster using etcdctl
command and tried to update deletion to the new deployment through bosh.
This also makes the bosh deployment to fail (etcd job is failing saying
"unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in the
etcd cluster?

regards,
Surendhar



Lingesh Mouleeshwaran
 

Hi Amit,

Thanks for taking time to respond. actually we are maintaining deployment
manifest templates and from there we are generating each components
manifest using spruce. So there by we are controlling the deviations.

Now coming to the etcd problem:

The old manifest is having 3 member and new manifest is having 4 member.
All 7 members joined together and formed a single etcd cluster. Now we need
to remove the existing 3 members from the cluster and delete the old
deployment.

The problem is, when we remove these 3 members from the
properties.etcd.machines in the new manifest and do a bosh deploy, the job
is failing during the update and not coming up. The exact error in the etcd
job logs is *'the member count is unequal'*

Regards
Lingesh M

On Wed, Feb 17, 2016 at 10:30 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Lingesh,

I don't think easier deployment and maintenance is that simple. Each
manifest may become smaller, but now you have to maintain multiple small
manifests. And keep them in sync. And make sure that they are all
compatible. There are pros and cons to any sort of decomposition like this.

With regards to targetting specific components for change, I think what
will really solve your problem is having a single CF deployment composed of
multiple releases. E.g. uaa as its own separate release within a single CF
deployment. If you wanted to, you could update the uaa release itself
instead of having to update all the jobs. You still have the problem of,
if you only update one component, how you know it's compatible with all the
things you don't upgrade, but it sounds like you're already willing to take
on that complexity.

This decomposition of cf-release into multiple releases (composed into a
single deployment) is currently underway.

With regards to scaling down etcd, I wasn't able to understand the problem
you're hitting. Can you provide more details about exactly what you did,
in what order?

Best,
Amit

On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran <
lingeshmouleeshwaran(a)gmail.com> wrote:

Hi Amit
The main advantage that we are targeting is to reduce deployment time for
any changes in the cloud foundry. The advantages include but not limited to
* Target specific components for changes
* Deployment time
* Addressing specific components for patch updates
* Easier deployment
* Easier maintenance etc

Regards
Lingesh M

On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Surendhar,

May I ask why you want to split the deployment into multiple
deployments? What problem are you having that you're trying to solve by
doing this?

Best,
Amit

On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com>
wrote:

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The migration
idea is to bring 4 new etcd machines in the cluster through new deployment.
Point all other components to these four etcd machines and delete the
existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a 4
node cluster, the cluster breaks and as a result all running apps are going
down. (Because the canary job brings one node down for the update, as a
result tolerance is breached.)

We also tried to remove these three nodes from the cluster using
etcdctl command and tried to update deletion to the new deployment through
bosh. This also makes the bosh deployment to fail (etcd job is failing
saying "unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in
the etcd cluster?

regards,
Surendhar



Amit Kumar Gupta
 

Orchestrating the etcd cluster is fairly complex, and what you're
describing is not a recommended usage. I'm not sure why you need a new 4
node cluster (why not just use the existing 3-node cluster? why the number
4?), but if you do, the simplest thing is to delete the old cluster, deploy
the new cluster, and then regenerate and redeploy *all* of your small
manifests to reflect the updated properties.etcd.machines.

On Tue, Feb 16, 2016 at 9:44 PM, Lingesh Mouleeshwaran <
lingeshmouleeshwaran(a)gmail.com> wrote:

Hi Amit,

Thanks for taking time to respond. actually we are maintaining deployment
manifest templates and from there we are generating each components
manifest using spruce. So there by we are controlling the deviations.

Now coming to the etcd problem:

The old manifest is having 3 member and new manifest is having 4 member.
All 7 members joined together and formed a single etcd cluster. Now we need
to remove the existing 3 members from the cluster and delete the old
deployment.

The problem is, when we remove these 3 members from the
properties.etcd.machines in the new manifest and do a bosh deploy, the job
is failing during the update and not coming up. The exact error in the etcd
job logs is *'the member count is unequal'*

Regards
Lingesh M


On Wed, Feb 17, 2016 at 10:30 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Lingesh,

I don't think easier deployment and maintenance is that simple. Each
manifest may become smaller, but now you have to maintain multiple small
manifests. And keep them in sync. And make sure that they are all
compatible. There are pros and cons to any sort of decomposition like this.

With regards to targetting specific components for change, I think what
will really solve your problem is having a single CF deployment composed of
multiple releases. E.g. uaa as its own separate release within a single CF
deployment. If you wanted to, you could update the uaa release itself
instead of having to update all the jobs. You still have the problem of,
if you only update one component, how you know it's compatible with all the
things you don't upgrade, but it sounds like you're already willing to take
on that complexity.

This decomposition of cf-release into multiple releases (composed into a
single deployment) is currently underway.

With regards to scaling down etcd, I wasn't able to understand the
problem you're hitting. Can you provide more details about exactly what
you did, in what order?

Best,
Amit

On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran <
lingeshmouleeshwaran(a)gmail.com> wrote:

Hi Amit
The main advantage that we are targeting is to reduce deployment time
for any changes in the cloud foundry. The advantages include but not
limited to
* Target specific components for changes
* Deployment time
* Addressing specific components for patch updates
* Easier deployment
* Easier maintenance etc

Regards
Lingesh M

On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Surendhar,

May I ask why you want to split the deployment into multiple
deployments? What problem are you having that you're trying to solve by
doing this?

Best,
Amit

On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com>
wrote:

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The migration
idea is to bring 4 new etcd machines in the cluster through new deployment.
Point all other components to these four etcd machines and delete the
existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a
4 node cluster, the cluster breaks and as a result all running apps are
going down. (Because the canary job brings one node down for the update, as
a result tolerance is breached.)

We also tried to remove these three nodes from the cluster using
etcdctl command and tried to update deletion to the new deployment through
bosh. This also makes the bosh deployment to fail (etcd job is failing
saying "unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in
the etcd cluster?

regards,
Surendhar



Lingesh Mouleeshwaran
 

Thanks Amit,

to have odd number of cluster size , we have added 4 new members in the new
deployment. now the plan is to remove the old 3 member + 1 member in new
deployment. , but while doing this , cluster size is not reducing and break
the cluster when 4 machine down, which makes all apps to restage and there
is an significant down time.


Regards
Lingesh M,

On Wed, Feb 17, 2016 at 11:21 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Orchestrating the etcd cluster is fairly complex, and what you're
describing is not a recommended usage. I'm not sure why you need a new 4
node cluster (why not just use the existing 3-node cluster? why the number
4?), but if you do, the simplest thing is to delete the old cluster, deploy
the new cluster, and then regenerate and redeploy *all* of your small
manifests to reflect the updated properties.etcd.machines.

On Tue, Feb 16, 2016 at 9:44 PM, Lingesh Mouleeshwaran <
lingeshmouleeshwaran(a)gmail.com> wrote:

Hi Amit,

Thanks for taking time to respond. actually we are maintaining deployment
manifest templates and from there we are generating each components
manifest using spruce. So there by we are controlling the deviations.

Now coming to the etcd problem:

The old manifest is having 3 member and new manifest is having 4 member.
All 7 members joined together and formed a single etcd cluster. Now we need
to remove the existing 3 members from the cluster and delete the old
deployment.

The problem is, when we remove these 3 members from the
properties.etcd.machines in the new manifest and do a bosh deploy, the job
is failing during the update and not coming up. The exact error in the etcd
job logs is *'the member count is unequal'*

Regards
Lingesh M


On Wed, Feb 17, 2016 at 10:30 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Lingesh,

I don't think easier deployment and maintenance is that simple. Each
manifest may become smaller, but now you have to maintain multiple small
manifests. And keep them in sync. And make sure that they are all
compatible. There are pros and cons to any sort of decomposition like this.

With regards to targetting specific components for change, I think what
will really solve your problem is having a single CF deployment composed of
multiple releases. E.g. uaa as its own separate release within a single CF
deployment. If you wanted to, you could update the uaa release itself
instead of having to update all the jobs. You still have the problem of,
if you only update one component, how you know it's compatible with all the
things you don't upgrade, but it sounds like you're already willing to take
on that complexity.

This decomposition of cf-release into multiple releases (composed into a
single deployment) is currently underway.

With regards to scaling down etcd, I wasn't able to understand the
problem you're hitting. Can you provide more details about exactly what
you did, in what order?

Best,
Amit

On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran <
lingeshmouleeshwaran(a)gmail.com> wrote:

Hi Amit
The main advantage that we are targeting is to reduce deployment time
for any changes in the cloud foundry. The advantages include but not
limited to
* Target specific components for changes
* Deployment time
* Addressing specific components for patch updates
* Easier deployment
* Easier maintenance etc

Regards
Lingesh M

On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Surendhar,

May I ask why you want to split the deployment into multiple
deployments? What problem are you having that you're trying to solve by
doing this?

Best,
Amit

On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com>
wrote:

Hi Cloud Foundry!
We are trying to split the cloud foundry deployment in to multiple
deployments. Each CF component will have its own deployment manifest.
We are doing this activity in an existing CF. We moved all components
except nats and etcd, into the new deployments. The original single
deployment is now having just these two jobs.

Of which, existing deployment is having 3 etcd machines. The
migration idea is to bring 4 new etcd machines in the cluster through new
deployment. Point all other components to these four etcd machines and
delete the existing 3 nodes.

However, if we delete the existing 3 nodes and do an update to form a
4 node cluster, the cluster breaks and as a result all running apps are
going down. (Because the canary job brings one node down for the update, as
a result tolerance is breached.)

We also tried to remove these three nodes from the cluster using
etcdctl command and tried to update deletion to the new deployment through
bosh. This also makes the bosh deployment to fail (etcd job is failing
saying "unequal number of nodes").

In this situation, what would be the best way to reduce the nodes in
the etcd cluster?

regards,
Surendhar