Date
1 - 8 of 8
Update Parallelization in Cloud Foundry
Omar Elazhary <omazhary@...>
Hello everyone,
I know it is possible to update and redeploy components in parallel in cloud foundry by setting the "serial" property in the deployment manifest to "false". However, is such a thing recommended? Are there particular job dependencies that I need to pay attention to? Regards, Omar |
|
Amit Kumar Gupta
Hey Omar,
toggle quoted message
Show quoted text
You can set the "serial" property at the global level of a deployment (you can think of it as setting a default for all jobs), and then override it at the individual job levels. You will want the consul server jobs to be deployed first, with serial: true, and max_in_flight: 1. The important thing here is, if you have more than one server in your consul cluster, they need to come up one at a time to ensure the cluster orchestration goes smoothly. The same is true if your etcd cluster has more than one server in it. If you're using the postgres job for CCDB and/or UAADB (instead of some external database), then you will want the postgres job to come up before CC and/or UAA. Similarly, if you're using the provided blobstore job instead of an external blobstore, you'll want it up before CC comes up. You might be able to get away with parallelizing some of the things above. E.g. if you bring the CC and blobstore up at the same time, CC might fail to start for a while until Blobstore comes up, and then CC might successfully start up. Monit also generally keeps retrying even after BOSH gives up. So your deploy might fail but later on, you might see everything up and running. Cheers, Amit On Mon, Mar 7, 2016 at 5:54 AM, Omar Elazhary <omazhary(a)gmail.com> wrote:
Hello everyone, |
|
Marco Voelz
Does NATS also need to come up before any of the other components?
toggle quoted message
Show quoted text
On 07/03/16 21:16, "Amit Gupta" <agupta(a)pivotal.io<mailto:agupta(a)pivotal.io>> wrote:
Hey Omar, You can set the "serial" property at the global level of a deployment (you can think of it as setting a default for all jobs), and then override it at the individual job levels. You will want the consul server jobs to be deployed first, with serial: true, and max_in_flight: 1. The important thing here is, if you have more than one server in your consul cluster, they need to come up one at a time to ensure the cluster orchestration goes smoothly. The same is true if your etcd cluster has more than one server in it. If you're using the postgres job for CCDB and/or UAADB (instead of some external database), then you will want the postgres job to come up before CC and/or UAA. Similarly, if you're using the provided blobstore job instead of an external blobstore, you'll want it up before CC comes up. You might be able to get away with parallelizing some of the things above. E.g. if you bring the CC and blobstore up at the same time, CC might fail to start for a while until Blobstore comes up, and then CC might successfully start up. Monit also generally keeps retrying even after BOSH gives up. So your deploy might fail but later on, you might see everything up and running. Cheers, Amit On Mon, Mar 7, 2016 at 5:54 AM, Omar Elazhary <omazhary(a)gmail.com<mailto:omazhary(a)gmail.com>> wrote: Hello everyone, I know it is possible to update and redeploy components in parallel in cloud foundry by setting the "serial" property in the deployment manifest to "false". However, is such a thing recommended? Are there particular job dependencies that I need to pay attention to? Regards, Omar |
|
Amit Kumar Gupta
You can probably try to start everything in parallel, and either set very
toggle quoted message
Show quoted text
long update timeouts, or allow the deployment to fail with the expectation that it will eventually correct itself. Or you can start things in a strict order, and have stronger constraints on the possible failure scenarios, and be able to debug the root cause of a failure better. Certain things do depend on NATS, and thus won't work until NATS is up. The main thing I can currently think of is registering routes with gorouter, which is done both for apps and for system components (e.g. the route-registrar registers api.SYSTEM_DOMAIN on behalf of the CC). Best, Amit On Tue, Mar 8, 2016 at 2:14 AM, Voelz, Marco <marco.voelz(a)sap.com> wrote:
Does NATS also need to come up before any of the other components? |
|
Marco Voelz
Thanks for clarifying this for me, Amit.
toggle quoted message
Show quoted text
Warm regards Marco On 09/03/16 07:43, "Amit Gupta" <agupta(a)pivotal.io<mailto:agupta(a)pivotal.io>> wrote:
You can probably try to start everything in parallel, and either set very long update timeouts, or allow the deployment to fail with the expectation that it will eventually correct itself. Or you can start things in a strict order, and have stronger constraints on the possible failure scenarios, and be able to debug the root cause of a failure better. Certain things do depend on NATS, and thus won't work until NATS is up. The main thing I can currently think of is registering routes with gorouter, which is done both for apps and for system components (e.g. the route-registrar registers api.SYSTEM_DOMAIN on behalf of the CC). Best, Amit On Tue, Mar 8, 2016 at 2:14 AM, Voelz, Marco <marco.voelz(a)sap.com<mailto:marco.voelz(a)sap.com>> wrote: Does NATS also need to come up before any of the other components? On 07/03/16 21:16, "Amit Gupta" <agupta(a)pivotal.io<mailto:agupta(a)pivotal.io>> wrote: Hey Omar, You can set the "serial" property at the global level of a deployment (you can think of it as setting a default for all jobs), and then override it at the individual job levels. You will want the consul server jobs to be deployed first, with serial: true, and max_in_flight: 1. The important thing here is, if you have more than one server in your consul cluster, they need to come up one at a time to ensure the cluster orchestration goes smoothly. The same is true if your etcd cluster has more than one server in it. If you're using the postgres job for CCDB and/or UAADB (instead of some external database), then you will want the postgres job to come up before CC and/or UAA. Similarly, if you're using the provided blobstore job instead of an external blobstore, you'll want it up before CC comes up. You might be able to get away with parallelizing some of the things above. E.g. if you bring the CC and blobstore up at the same time, CC might fail to start for a while until Blobstore comes up, and then CC might successfully start up. Monit also generally keeps retrying even after BOSH gives up. So your deploy might fail but later on, you might see everything up and running. Cheers, Amit On Mon, Mar 7, 2016 at 5:54 AM, Omar Elazhary <omazhary(a)gmail.com<mailto:omazhary(a)gmail.com>> wrote: Hello everyone, I know it is possible to update and redeploy components in parallel in cloud foundry by setting the "serial" property in the deployment manifest to "false". However, is such a thing recommended? Are there particular job dependencies that I need to pay attention to? Regards, Omar |
|
Dieu Cao <dcao@...>
It should also be considered that in some scenarios the order of deployment
toggle quoted message
Show quoted text
as recommended serially will most often be the most tested in terms of ensuring backwards compatibility of code changes during deployment. For example, a new end point might be added to cloud controller to be used by DEAs/CELLs and it is assumed that because of the serial deployment order, that all cloud controller's will have completed updating and thus the new end point available prior to DEAs/CELLs updating so then code changes to DEAs/CELLs can simply switch over to using the new end points as they update and there is no need to keep the code on DEAs/CELLs that used the older end points. -Dieu CF Runtime PMC Lead On Wed, Mar 9, 2016 at 2:34 AM, Voelz, Marco <marco.voelz(a)sap.com> wrote:
Thanks for clarifying this for me, Amit. |
|
Omar Elazhary <omazhary@...>
Thanks everyone. What I understood from Amit's response is that I can parallelize certain components. What I also understood from both Amit's and Dieu's responses is that some components have hard dependencies, while others only have soft ones, and some components have no dependencies at all. My question is: how can I figure out these dependencies? Are they listed somewhere? The cloud foundry docs do a great job of describing each component separately, but they do not explain which should be up before which. That is what I need in order to work an execution plan in order to minimize update time, all the while keeping CF 100% available.
Thanks. Regards, Omar |
|
Amit Kumar Gupta
If by "hard dependency" you mean something that has to be up strictly
toggle quoted message
Show quoted text
before another thing for a deploy to possibly succeed, I'm not sure if there are any such hard dependencies. PCFDev (formerly MicroPCF) brings up all the components simultaneously on a single VM [1 <https://github.com/pivotal-cf/micropcf>]. Some processes will flap until other ones are up, but they eventually do all come up. There probably isn't a single solution to minimizing update time while guaranteeing 100% uptime, as the answer will depend on a lot of different things. Are you running DEA and/or Diego? External database and/or external blobstore? Are you just talking about uptime of apps, or also of the platform API? What about services as well? If you find a colocation/update strategy that works for you, I think the community would really appreciate hearing about it. (Just for fun, there's also nanocf [2 <https://github.com/sclevine/nanocf>] which is a Docker image with all of CF in it, and a bunch of videos where I run nanocf in nanocf in BOSH-Lite CF [3 <https://www.youtube.com/watch?v=oMUGjaWg_Hk&list=PLdgSOpBLY_uFbzo1f1prmjW0hf4z1rWdm> ]) [1] https://github.com/pivotal-cf/micropcf [2] https://github.com/sclevine/nanocf [3] https://www.youtube.com/watch?v=oMUGjaWg_Hk&list=PLdgSOpBLY_uFbzo1f1prmjW0hf4z1rWdm Cheers, Amit On Thu, Mar 10, 2016 at 2:24 AM, Omar Elazhary <omazhary(a)gmail.com> wrote:
Thanks everyone. What I understood from Amit's response is that I can |
|