Re: cf platform upgrade with 100% uptime for apps


Christopher Piraino <cpiraino@...>
 

Yes, we do not recommend using HaProxy in production environments because
of this downtime issue. I have heard of some HaProxy load balancer
solutions using keepalived
<https://www.howtoforge.com/setting-up-a-high-availability-load-balancer-with-haproxy-keepalived-on-debian-lenny>
but
have not investigated/tried them.

Stephen, for health checking the GoRouter we configure it to check that it
can establish a TCP connections on port 80 (or whatever port it is
listening on) with additional configuration depending on your load
balancer's features. For example, on the Routing team we use AWS for our
test environments and configure it with a connection timeout, checking
interval, healthy threshold, and unhealthy threshold. We then rely on the
AWS ELB's load balancing algorithm to retry any failed connections.

One thing to note is that most load balancers, like the GoRouter and AWS
ELB, will only retry on connection errors, if a connection is established
any HTTP errors will be sent back to the client.

Chris and Iryna, CF Routing

On Mon, Mar 14, 2016 at 7:27 AM, Stephen Byers <smbyers(a)gmail.com> wrote:

Thanks, Gwenn, and I agree that a load balancer is needed here which is
the approach I have personally taken. The original question with haproxy
was from Ben and I think I would agree that there is no way to achieve 100%
uptime without having a separate CF installation where all traffic could be
directed while the other platform is upgraded.

Even with the load balancer solution, I am curious how the health checking
should (or if it could) be configured to achieve 100% uptime. There may be
some level of interruption until the load balance figures out that one of
the gorouters is down and takes it out of service. Load balancers are
certainly not my specialty, though.

Thanks

On Mon, Mar 14, 2016 at 1:39 AM Gwenn Etourneau <getourneau(a)pivotal.io>
wrote:

Stephen,

Haproxy is clearly a SPOF here, that's why in production most of the
people use a load balancer with active health checking.




Thanks

On Mon, Mar 14, 2016 at 3:07 PM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:

Stephen, I think this is only possible if you deployed CF unto multiple
DCs. Same configuration, multiple DNSs


On Mon, Mar 14, 2016 at 3:55 AM, Stephen Byers <smbyers(a)gmail.com>
wrote:

Agree. But the haproxy fronts the router so any client that is pinned
to the haproxy that is taken down will not make it to the router until its
DNS ttl is reached and it resolves to the other haproxy ip and even that
may not happen if this is a DNS round robin configuration.

I could be missing something?

On Sun, Mar 13, 2016, 8:44 PM Ben R <vagcom.ben(a)gmail.com> wrote:

I think one (or two) of the routers help in this situation even if one
haproxy is out of service.

Ben


On Sun, Mar 13, 2016 at 6:32 PM, Stephen Byers <smbyers(a)gmail.com>
wrote:

Will that solve the problem? BOSH will only take one haproxy out of
service at a time but those clients that resolved the DNS name to the IP of
the haproxy that is taken out of service for upgrade will be impacted,
correct?

Thanks

On Sun, Mar 13, 2016, 8:25 PM Amit Gupta <agupta(a)pivotal.io> wrote:

In your case, 2 HAProxys with DNS configured to point at both.

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.