Re: Asynchronous route-mapping and blue-green deployment


Shannon Coen
 

Hello Jens,

I understand your concern. There isn't a simple solution at the moment but
I have recorded your feedback.

We are aware that there is more CF could do with regard to upgrading apps
without downtime. For quite some time we have discussed offering automated
cutover between versions of an app as a built-in feature, so app developers
wouldn't have to use these blue-green scripts. Priorities are hard.
However, even if this feature were offered there may have to be significant
changes to the routing architecture to guarantee that routes have been
mapped before calling the operation a success.

Considering the routing tier is horizontally scalable, what would your
success criteria be for "a route is mapped to an app"? Would it be that the
change must be applied to the routing tables of all routers or only some
percentage?

add a wait that is hopefully long enough and cross fingers


Yes, or check that the new version of your app is responding before
unmapping the old version from the route. Maybe expose a version endpoint?

Shannon Coen
Product Manager, Cloud Foundry
Pivotal, Inc.

On Thu, Feb 9, 2017 at 2:01 AM, Jens Keller <jens.keller(a)sap.com> wrote:

Hi Shannon,
thanks for your feedback - comments inline.
Best regards
Jens


Currently Cloud Controller has no way of knowing whether a route is ever
registered with a router, whether the app is on DEAs or Diego. We could
consider how to provide such a guarantee; e.g. CC could poll routes until
an app returns a 200 (several issues with this). Also, if we were to add
this feature we couldn't change the response in v2 from 201 to 202 as
that
would not be backwards compatible; we could consider this for the v3 CC
API.

Not sure if my understanding is correct. First of all, to avoid
misunderstandings, it'd be totally fine if we had an asynchronous
mechanism. We don't need the operation to be synchronous. Second, if that's
a new API, that's absolutely fine as well, we understand & agree that
incompatible changes are not the way to go.

But the point here is, whether the return code is correct or not is not so
much our issue: the issue is that we currently do not see any reliable way
of knowing whether the route mapping worked or not, or is still in
progress. So how do we know when we can disconnect the old version? See
also the next comment below. So agree, just polling the route until the app
returns a 200 is not the way to go (also see below).


We recommend adding a short wait to your blue-green deploy script. The
blue-green-deploy CLI plugin effectively does this, but running a few
other
commands (renaming apps) between mapping the route to one app and
unmapping
it from another.
Given we have a lot of instances of the old version, and just created one
or a few instances of the new version on the same route - how do we even
know that the route mapping worked at all?

Even when waiting for 5 minutes, it could be we get a 200, just because
the response came from the old version, so we think it works, but when we
now disconnect the old version it'd be fatal. And if we get a 404, how do
we know whether we need to wait for 5 minutes, or the mapping just failed
entirely and there's no point in waiting any longer at all.

To me that would feel a bit like "add a wait that is hopefully long enough
and cross fingers" - I'm afraid such an approach is not really acceptable
for enterprise applications with business-critical processes. But not sure
if I got you wrong.

I'm not sure which changes to the router would be possible and which won't
- but I assume if it is possible to tell the router "please map X", it
should be possible to add a feature, with which one could ask the router
"what's the status of the mapping operation of X"?

The cloud controller could then just provide a new API that exposes this,
so that a deployment script can poll this information – or am I missing
something?

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.