[cf-dev] Re: proposed stemcell network performance tuning


Benjamin Black <bblack@...>
 

there are two problems:

1) certain load balancer versions and configurations have unexpected
behavior around tcp timestamps when there is a mix of windows and
non-windows clients (really a mix of timestamps and not-timestamps). the
result is the linux servers in cloud foundry sending resets long before
ports are exhausted.

2) to date, these parameters have been configured in an ad hoc fashion as
problems are encountered leading to a lot of variation in configuration
across the various cloud foundry components. the ad hoc solutions have in
some cases even been counter-productive: tcp_tw_recycle was previously
enabled on some components, exacerbating #1.

the changes amit proposes are not exhaustive, but rather conservative and
address exactly these two problems. other tuning might be beneficial to the
platform. such additional tuning is not required for these scenarios.


b

On Wed, Sep 30, 2015 at 6:05 PM, Joshua McKenty <jmckenty(a)pivotal.io> wrote:

Amit - I worry about changes to the former in the context of HTTP 1.0 and
1.1, especially without pipelining. What problem are you trying to solve?

If you’re having trouble initiating new sockets, there are other kernel
params we should adjust.


On Sep 29, 2015, at 5:17 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi all,

I'd like to propose tuning a couple kernel parameters related to tcp
performance:

# TCP_FIN_TIMEOUT
# This setting determines the time that must elapse before TCP/IP can
release a closed connection and reuse
# its resources. During this TIME_WAIT state, reopening the connection to
the client costs less than establishing
# a new connection. By reducing the value of this entry, TCP/IP can
release closed connections faster, making more
# resources available for new connections. Adjust this in the presence of
many connections sitting in the
# TIME_WAIT state:

echo 5 > /proc/sys/net/ipv4/tcp_fin_timeout

# TCP_TW_REUSE
# This allows reusing sockets in TIME_WAIT state for new connections when
it is safe from protocol viewpoint.
# Default value is 0 (disabled). It is generally a safer alternative to
tcp_tw_recycle

echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Currently, these parameters are set by certain jobs in cf-release,
diego-release, and perhaps others. Any VM needing to establish a high
number of incoming/outgoing tcp connections in a short period of time will
be unable to establish new connections without changing these parameters.

We believe these parameters are safe to change across the board, and will
be generally beneficial. The existing defaults made sense for much older
networks, but can be greatly optimized for modern systems.

Please share with the mailing lists if you have any questions or feedback
about this proposal. If you maintain a bosh release and would like to see
how these changes would affect your release, you can create a job which
simply does the above in its startup scripts, and colocate that job with
all the other jobs in a deployment of your release.

Thanks,

Amit Gupta
Cloud Foundry PM, OSS Release Integration team