Re: Strict CPU quotas proposal

Chip Childers <cchilders@...>

+Will and Julz

Thoughts gents?

On Tue, Oct 25, 2016 at 10:23 PM Carlo Alberto Ferraris <
carlo.ferraris(a)> wrote:

It is our understanding that currently application instances get CPU
quotas assigned via cgroup CPU shares on the container running them[1].
This effectively sets a "minimum quota" of CPU time each container is
guaranteed to have available, but leaves the maximum amount of CPU time

This may be fine in the average case, but can have some pretty annoying
effects in certain edge cases.

For the sake of discussion, let's consider 2 Diego cells having the same
number N of containers running, with the same amount of cpu shares, memory
and disk assigned. From the perspective of Diego, these two cells are
perfectly balanced. Now let's assume that (because we're very unlucky):
- we have one instance of the same app running on each one of the 2 cells
- the other N-1 instances on cell 1 use all available CPU
- the other N-1 instances on cell 2 are using no CPU
In this case the net effect is that the instance of the app in cell 2 will
have N times the CPU time as the instance in cell 1.

It would be desirable, from our perspective, to be able to control such
"performance swings" because they may lead users into overestimating their
available processor resources, potentially leading to inability of certain
instances to effectively serve their share of traffic.

What we propose is to add the (opt-in) ability for CF operators to control
the upper bound of how much CPU time a container can use. Specifically we
suggest to teach garden (runc) how to set also cpu.cfs_quota_us and
cpu.cfs_period_us (see CpuQuota and CpuPeriod in [3] and [2] for details).
To be absolutely explicit: this would be a operator-controlled feature, not
a user-controlled one.

What follows is a draft of a proposal about how to provide this
functionality. It is intended as basis for discussion more than as a fully
fleshed-out proposal.

Concretely this would likely require exposing two additional tunables for
Diego reps (names are placeholders):
- cpu_quota_period_us
- cpu_quota_burst_ratio

cpu_quota_period_us would be set as is as the value of cpu.cfs_period_us.
This is the time window used for allocating CPU time (see [2] for details).
Setting it to 0 (default) would disable strict CPU quotas. Values
1000<=cpu_quota_period_us<=1000000 are valid and will enable strict CPU
quotas. Any other value is illegal. It is illegal to set
cpu_quota_period_us to 0 if cpu_quota_burst_ratio is not 0.

cpu_quota_burst_ratio would instead be used to compute the value of
cpu.cfs_quota_us based on the number of cores in the host (n_cores), the
number of shares assigned to the container (cpu_shares) and the maximum
number of shares that can be assigned to running containers
(cpu_shares_max) as follows:

cpu.cfs_quota_us = cpu_quota_period_us * n_cores * cpu_quota_burst_ratio *
cpu_shares / cpu_shares_max

cpu_shares_max can be calculated, if we ignore
instance_min_cpu_share_limit, instance_max_cpu_share_limit and other
limits, as:

cpu_shares_max = memory_mb * memory_overcommit_factor /

Setting cpu_quota_burst_ratio to 0 (default) would disable strict CPU
quotas. Values of cpu_quota_burst_ratio>=1 are valid and enable strict CPU
quotas. Any other value is illegal. It is illegal to set
cpu_quota_burst_ratio to 0 if cpu_quota_period_us is not 0.

For example:
- setting it to 1 will ensure that each container can use at least and at
most (cpu_shares / max_cpu_shares) of the total processor time every
cpu_quota_period_us, thereby virtually eliminating performance fluctuations
across instances
- setting it to 1.5 will ensure that each container can use at least
(cpu_shares / max_cpu_shares) and at most (1.5 * cpu_shares /
max_cpu_shares) of the total processor time every cpu_quota_period_us,
thereby allowing applications to use up to 50% over their "CPU shares quota"

- the above would be correct if no other processes were running outside of
the application containers: this is obviously not true but limiting
resource usage of the system components is largely outside of the scope of
this proposal
- cpu_quota_period_us should be exposed because it allows to control the
latency/throughput trade-off (see [2] for why)
- 0 < cpu_quota_burst_ratio < 1 is defined as illegal in the current
proposal because we can't come up with a good scenario where such values
may make sense
- illegal values for cpu_quota_burst_ratio and cpu_quota_period_us should
cause early errors (bosh deploy and rep startup)
- we haven't looked into the equivalent change for garden windows, but
we're aware that similar functionality exists in the windows APIs [4]

Chip Childers
VP Technology, Cloud Foundry Foundation

Join { to automatically receive all group messages.