Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|
Hi Surendhar,
May I ask why you want to split the deployment into multiple deployments? What problem are you having that you're trying to solve by doing this?
Best, Amit
toggle quoted message
Show quoted text
On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote: Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|
Hi Amit The main advantage that we are targeting is to reduce deployment time for any changes in the cloud foundry. The advantages include but not limited to * Target specific components for changes * Deployment time * Addressing specific components for patch updates * Easier deployment * Easier maintenance etc
Regards Lingesh M
toggle quoted message
Show quoted text
On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote: Hi Surendhar,
May I ask why you want to split the deployment into multiple deployments? What problem are you having that you're trying to solve by doing this?
Best, Amit
On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:
Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|
Hi Lingesh, I don't think easier deployment and maintenance is that simple. Each manifest may become smaller, but now you have to maintain multiple small manifests. And keep them in sync. And make sure that they are all compatible. There are pros and cons to any sort of decomposition like this. With regards to targetting specific components for change, I think what will really solve your problem is having a single CF deployment composed of multiple releases. E.g. uaa as its own separate release within a single CF deployment. If you wanted to, you could update the uaa release itself instead of having to update all the jobs. You still have the problem of, if you only update one component, how you know it's compatible with all the things you don't upgrade, but it sounds like you're already willing to take on that complexity. This decomposition of cf-release into multiple releases (composed into a single deployment) is currently underway. With regards to scaling down etcd, I wasn't able to understand the problem you're hitting. Can you provide more details about exactly what you did, in what order? Best, Amit On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran < lingeshmouleeshwaran(a)gmail.com> wrote: Hi Amit The main advantage that we are targeting is to reduce deployment time for any changes in the cloud foundry. The advantages include but not limited to * Target specific components for changes * Deployment time * Addressing specific components for patch updates * Easier deployment * Easier maintenance etc
Regards Lingesh M
On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Surendhar,
May I ask why you want to split the deployment into multiple deployments? What problem are you having that you're trying to solve by doing this?
Best, Amit
On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:
Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|
Hi Amit,
Thanks for taking time to respond. actually we are maintaining deployment manifest templates and from there we are generating each components manifest using spruce. So there by we are controlling the deviations.
Now coming to the etcd problem:
The old manifest is having 3 member and new manifest is having 4 member. All 7 members joined together and formed a single etcd cluster. Now we need to remove the existing 3 members from the cluster and delete the old deployment.
The problem is, when we remove these 3 members from the properties.etcd.machines in the new manifest and do a bosh deploy, the job is failing during the update and not coming up. The exact error in the etcd job logs is *'the member count is unequal'*
Regards Lingesh M
toggle quoted message
Show quoted text
On Wed, Feb 17, 2016 at 10:30 AM, Amit Gupta <agupta(a)pivotal.io> wrote: Hi Lingesh,
I don't think easier deployment and maintenance is that simple. Each manifest may become smaller, but now you have to maintain multiple small manifests. And keep them in sync. And make sure that they are all compatible. There are pros and cons to any sort of decomposition like this.
With regards to targetting specific components for change, I think what will really solve your problem is having a single CF deployment composed of multiple releases. E.g. uaa as its own separate release within a single CF deployment. If you wanted to, you could update the uaa release itself instead of having to update all the jobs. You still have the problem of, if you only update one component, how you know it's compatible with all the things you don't upgrade, but it sounds like you're already willing to take on that complexity.
This decomposition of cf-release into multiple releases (composed into a single deployment) is currently underway.
With regards to scaling down etcd, I wasn't able to understand the problem you're hitting. Can you provide more details about exactly what you did, in what order?
Best, Amit
On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran < lingeshmouleeshwaran(a)gmail.com> wrote:
Hi Amit The main advantage that we are targeting is to reduce deployment time for any changes in the cloud foundry. The advantages include but not limited to * Target specific components for changes * Deployment time * Addressing specific components for patch updates * Easier deployment * Easier maintenance etc
Regards Lingesh M
On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Surendhar,
May I ask why you want to split the deployment into multiple deployments? What problem are you having that you're trying to solve by doing this?
Best, Amit
On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:
Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|
Orchestrating the etcd cluster is fairly complex, and what you're describing is not a recommended usage. I'm not sure why you need a new 4 node cluster (why not just use the existing 3-node cluster? why the number 4?), but if you do, the simplest thing is to delete the old cluster, deploy the new cluster, and then regenerate and redeploy *all* of your small manifests to reflect the updated properties.etcd.machines. On Tue, Feb 16, 2016 at 9:44 PM, Lingesh Mouleeshwaran < lingeshmouleeshwaran(a)gmail.com> wrote: Hi Amit,
Thanks for taking time to respond. actually we are maintaining deployment manifest templates and from there we are generating each components manifest using spruce. So there by we are controlling the deviations.
Now coming to the etcd problem:
The old manifest is having 3 member and new manifest is having 4 member. All 7 members joined together and formed a single etcd cluster. Now we need to remove the existing 3 members from the cluster and delete the old deployment.
The problem is, when we remove these 3 members from the properties.etcd.machines in the new manifest and do a bosh deploy, the job is failing during the update and not coming up. The exact error in the etcd job logs is *'the member count is unequal'*
Regards Lingesh M
On Wed, Feb 17, 2016 at 10:30 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Lingesh,
I don't think easier deployment and maintenance is that simple. Each manifest may become smaller, but now you have to maintain multiple small manifests. And keep them in sync. And make sure that they are all compatible. There are pros and cons to any sort of decomposition like this.
With regards to targetting specific components for change, I think what will really solve your problem is having a single CF deployment composed of multiple releases. E.g. uaa as its own separate release within a single CF deployment. If you wanted to, you could update the uaa release itself instead of having to update all the jobs. You still have the problem of, if you only update one component, how you know it's compatible with all the things you don't upgrade, but it sounds like you're already willing to take on that complexity.
This decomposition of cf-release into multiple releases (composed into a single deployment) is currently underway.
With regards to scaling down etcd, I wasn't able to understand the problem you're hitting. Can you provide more details about exactly what you did, in what order?
Best, Amit
On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran < lingeshmouleeshwaran(a)gmail.com> wrote:
Hi Amit The main advantage that we are targeting is to reduce deployment time for any changes in the cloud foundry. The advantages include but not limited to * Target specific components for changes * Deployment time * Addressing specific components for patch updates * Easier deployment * Easier maintenance etc
Regards Lingesh M
On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Surendhar,
May I ask why you want to split the deployment into multiple deployments? What problem are you having that you're trying to solve by doing this?
Best, Amit
On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:
Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|
Thanks Amit,
to have odd number of cluster size , we have added 4 new members in the new deployment. now the plan is to remove the old 3 member + 1 member in new deployment. , but while doing this , cluster size is not reducing and break the cluster when 4 machine down, which makes all apps to restage and there is an significant down time.
Regards Lingesh M,
toggle quoted message
Show quoted text
On Wed, Feb 17, 2016 at 11:21 AM, Amit Gupta <agupta(a)pivotal.io> wrote: Orchestrating the etcd cluster is fairly complex, and what you're describing is not a recommended usage. I'm not sure why you need a new 4 node cluster (why not just use the existing 3-node cluster? why the number 4?), but if you do, the simplest thing is to delete the old cluster, deploy the new cluster, and then regenerate and redeploy *all* of your small manifests to reflect the updated properties.etcd.machines.
On Tue, Feb 16, 2016 at 9:44 PM, Lingesh Mouleeshwaran < lingeshmouleeshwaran(a)gmail.com> wrote:
Hi Amit,
Thanks for taking time to respond. actually we are maintaining deployment manifest templates and from there we are generating each components manifest using spruce. So there by we are controlling the deviations.
Now coming to the etcd problem:
The old manifest is having 3 member and new manifest is having 4 member. All 7 members joined together and formed a single etcd cluster. Now we need to remove the existing 3 members from the cluster and delete the old deployment.
The problem is, when we remove these 3 members from the properties.etcd.machines in the new manifest and do a bosh deploy, the job is failing during the update and not coming up. The exact error in the etcd job logs is *'the member count is unequal'*
Regards Lingesh M
On Wed, Feb 17, 2016 at 10:30 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Lingesh,
I don't think easier deployment and maintenance is that simple. Each manifest may become smaller, but now you have to maintain multiple small manifests. And keep them in sync. And make sure that they are all compatible. There are pros and cons to any sort of decomposition like this.
With regards to targetting specific components for change, I think what will really solve your problem is having a single CF deployment composed of multiple releases. E.g. uaa as its own separate release within a single CF deployment. If you wanted to, you could update the uaa release itself instead of having to update all the jobs. You still have the problem of, if you only update one component, how you know it's compatible with all the things you don't upgrade, but it sounds like you're already willing to take on that complexity.
This decomposition of cf-release into multiple releases (composed into a single deployment) is currently underway.
With regards to scaling down etcd, I wasn't able to understand the problem you're hitting. Can you provide more details about exactly what you did, in what order?
Best, Amit
On Tue, Feb 16, 2016 at 8:50 PM, Lingesh Mouleeshwaran < lingeshmouleeshwaran(a)gmail.com> wrote:
Hi Amit The main advantage that we are targeting is to reduce deployment time for any changes in the cloud foundry. The advantages include but not limited to * Target specific components for changes * Deployment time * Addressing specific components for patch updates * Easier deployment * Easier maintenance etc
Regards Lingesh M
On Wed, Feb 17, 2016 at 4:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Surendhar,
May I ask why you want to split the deployment into multiple deployments? What problem are you having that you're trying to solve by doing this?
Best, Amit
On Mon, Feb 15, 2016 at 9:34 AM, Suren R <suren.devices(a)gmail.com> wrote:
Hi Cloud Foundry! We are trying to split the cloud foundry deployment in to multiple deployments. Each CF component will have its own deployment manifest. We are doing this activity in an existing CF. We moved all components except nats and etcd, into the new deployments. The original single deployment is now having just these two jobs.
Of which, existing deployment is having 3 etcd machines. The migration idea is to bring 4 new etcd machines in the cluster through new deployment. Point all other components to these four etcd machines and delete the existing 3 nodes.
However, if we delete the existing 3 nodes and do an update to form a 4 node cluster, the cluster breaks and as a result all running apps are going down. (Because the canary job brings one node down for the update, as a result tolerance is breached.)
We also tried to remove these three nodes from the cluster using etcdctl command and tried to update deletion to the new deployment through bosh. This also makes the bosh deployment to fail (etcd job is failing saying "unequal number of nodes").
In this situation, what would be the best way to reduce the nodes in the etcd cluster?
regards, Surendhar
|
|