Re: Garden Port Assignment Story


Mike Youngstrom
 

Yes Will, that summary is essentially correct. But, for even more clarity
let me restate the complete story again and reason I want 92085170 to work
across stemcell upgrades. :)

Today if NATS goes down after 2 minutes the routers will drop their routing
tables and my entire CF deployment goes down. The routers behave this way
because of an experience Dieu had [0]. I don't like this I would prefer
for routers to not drop routing tables if it cannot connect to Nats.
Therefore, the routing team is adding 'prune_on_config_unavailable'. I
plan to set this to false to make my deployment less sensitive to NATS
failure. In doing so I am incurring more risk of mis routed stale routes.
I am hoping that 92085170 will help reduce some of that risk. Since one of
the times I personally have experienced stale route routing was during a
deploy I hope that Garden will consider a port selection technique that
will help ensure uniqueness across stemcell upgrades, something we
frequently do as part of a deploy.

Consequently a stateless solution like random assignment or a consistent
hash will work across stemcell upgrades.

Thanks,
Mike

[0]
https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/yuVYCZkMLG8/7t8FHnFzWEsJ

On Tue, Nov 24, 2015 at 3:44 AM, Will Pragnell <wpragnell(a)pivotal.io> wrote:

Hi Mike,

What I think you're saying is that once the new
`prune_on_config_unavailable` property is available in the router, and if
it's set to `false`, there's a case when NATs is not reachable from the
router in which potentially stale routes will continue to exist until the
router can reach NATs again. Is that correct?

(Sorry to repeat you back at yourself, just want to make sure I've
understood you correctly.)

Will

On 23 November 2015 at 19:02, Mike Youngstrom <youngm(a)gmail.com> wrote:

Hi Will,

Though I see the main reason for the issue assuming a healthy running
environment I've also experienced a deploy related issue that more unique
port assignment could help defend against. During one of our deploys the
routers finished deployed before the DEAs. When the DEAs started rolling,
for some reason some of our routers stopped getting route updates from
NATs. This caused their route tables to go stale and as apps started
rolling new apps started getting assigned ports previously held by other
apps. Which caused a number of our hosts to be mis-routed.

Though the root cause was probably some bug in the Nats client in
GoRouter the runtime team had apparently experienced a similar issue in the
past [0] which caused them to implement code that would delete stale routes
even then a router couldn't connect to NATs. The Router team is now
planning to optionally remove this failsafe [1]. I'm hoping that with the
removal of this failsafe (which I'm planning to take advantage of) this
tracker story will help protect us from the problem we experienced before
from happening again.

If the ports simply reset on a stemcell upgrade this issue provides no
defense for the problem we had before.

Does that make sense Will?

Mike

[0]
https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/yuVYCZkMLG8/7t8FHnFzWEsJ
[1] https://www.pivotaltracker.com/story/show/108659764

On Mon, Nov 23, 2015 at 11:11 AM, Will Pragnell <wpragnell(a)pivotal.io>
wrote:

Hi Mike,

What's the motivation for wanting rolling port assignment to persist
across e.g. stemcell upgrade? The motivation for this story is to prevent
stale routes from sending traffic to the wrong containers. Our assumption
is that stale routes won't ever exist for anything close to the amount of
time it takes BOSH to destroy and recreate a VM. Have we missed something
in making that assumption?

On your second point, I see your concern. We've talked about the
possibility of implementing FIFO semantics on free ports (when a port that
was in use becomes free, it goes to the end of the queue of available
ports) to decrease the chances of traffic reaching the wrong container as
far as possible. It's possible that the rolling ports approach is "good
enough" though. We're still trying to understand whether that's actually
the case.

The consistent hashing idea is interesting, but a few folks have
suggested that with a relatively small range of available ports (5000 by
default) that the chances of collision are actually higher than we'd want.
I'll see if someone wants to lay down some maths to give that idea some
credence.

Cheers,
Will

On 23 November 2015 at 08:47, Mike Youngstrom <youngm(a)gmail.com> wrote:

Since I cannot comment in tracker I'm starting this thread to discuss
story:
https://www.pivotaltracker.com/n/projects/1158420/stories/92085170

Some comments I have:

* Although I can see how a rolling port assignment could be maintained
across garden/diego restarts I'd also like the story to ensure that the
rolling port assignments get maintained across a Stemcell upgrade without
the need for persistent disks on each cell. Perhaps etcd?

* Another thing to keep in mind. Although a rolling port value may not
duplicate ports 100% of the time in a short lived container in a long lived
container it seems to me that a rolling port assignment becomes no more
successful than a random port assignment if the container lives long enough
for the port assignment loop to loop a few times.

* Has there been any consideration to using an incremental consistent
hash of the app_guid to assign ports? A consistent hash would have the
benefit of being stateless. It also would have the benefit of increasing
the likely hood that if a request is sent to a stale route it may be to the
correct app anyway.

Thoughts?

Mike

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.