Re: Quieting a Noisy Neighbor #cf


Julz Friedman
 

Hi Stanislav -- cpu usage is not currently unlimited, quotas are already enforced in proportion to a container's memory limit. If multiple containers on a cell each try to use all resources, they will each receive them in proportion to their memory limit. So, for example, if there are two 64MB apps attempting to use maximal resources, each will actually get 50%. If there's a 128MB app and a 64MB app, the 128MB app will get twice as much cpu as the 64MB app so 66% and 33% respectively. If a user wants to be able to use more cpu they can use a larger instance type (i.e. in Cloud Foundry memory limit is actually memory limit and cpu share). This is enforced using the cpu cgroup's `cpu.shares` mechanism. Operators should ensure that cells have enough cpu to support the number of apps (i.e. advertised memory limit) on their cells. 

There's also the cpu maximums feature you linked to, this is an additional setting folks can opt in to - or not - to _also_ set an absolute maximum, also in proportion to memory limit, in addition to the relative shares which are enabled on every container as above. The advantage of the maximum is that it enforces an absolute limit, which cannot be exceeded even if spare resource is available, which can avoid confusion where apps sometimes get more cpu than they're entitled to because they land on a quiet cell. If I've read your script correctly I believe if you set the `cpu_quota_per_share_in_us` property that you linked to to a number based on the available memory in the host and the cpu_max_factor you desire it will have exactly the effect of your script.

Let me know if this answers your question or if there's a suggestion for another feature that I've missed here.

Thanks!

Julz
Garden PM


On Fri, 5 Jan 2018 at 17:03 Natalie Bennett <nbennett@...> wrote:
Are you looking for feedback on this particular implementation or on the general idea?

- Natalie

On Thu, Jan 4, 2018 at 11:50 PM Stanislav German-Evtushenko <s.germanevtushenko@...> wrote:
While maximum memory usage for a container is effectively limited, CPU computing time of a VM may be fully used up by a single instance of an application as it is currently unbounded. This may lead to undesired behavior (e.g. unexpected latency) of neighbor applications and Cloud Foundry seems to be not having a useful mechanism to deal with it.

What about CPU Maximums?

(https://bosh.io/releases/github.com/cloudfoundry/garden-runc-release?version=1.3.0, https://bosh.io/jobs/garden?source=github.com/cloudfoundry/garden-runc-release#p=garden.cpu_quota_per_share_in_us)

Setting CPU maximums could be a solution however it doesn't come for free:
- resource utilization will be far from optimal
- an application may experience significant latency jumps all the time even if no other applications are running on the same VM

What can we do?


The most common cases could be covered per application and per space CPU limitation mechanism. This would allow a User to limit their application when it is known to be CPU hungry (for example some batch jobs) and this would also allow Operator to automate CPU usage limiting based on CPU usage history.

The CPU limitation mechanism can be exposed to a user as a single parameter cpu_max_factor, e.g.
cpu_max_factor: 1.5  # maximum limit is 1.5 times higher then nominal limit
cpu_max_factor: 2    # maximum limit is 2 times higher then nominal limit
cpu_max_factor: 0    # or -1, which is default and means no limit

Calculation

A PoC script that could be used for Operators is bellow (written in bash).

Assumptions:
- CPU maximum limit is proportional to requested memory with ratio defined by CPU_MAX_FACTOR
- The script is run on a Diego Cell
- We already know PID of the hungry process

#################################################################################################################
#!/bin/bash

PID=XXXX  # Replace by valid Process ID
CPU_MAX_FACTOR=2  # 2 means that maximum limit is 2 times higher then nominal (minimal) limit
CELL_CPU=$(nproc)
CELL_MEMORY=$(( $(jq -r .memory_mb /var/vcap/jobs/rep/config/rep.json) * 1024**2 ))  # Cell memory quota in bytes
CGROUP_ID=$(cat "/proc/$PID/cgroup" | awk -F: '$2 ~ /memory/ {print $3}' | sed 's#^/##')
CONTAINER_GUID=$(sudo runc list -q | grep "^$CGROUP_ID$")

if [[ -n "$CONTAINER_GUID" ]]; then
    CONTAINER_MEMORY=$(cat "/sys/fs/cgroup/memory/$CONTAINER_GUID/memory.limit_in_bytes")
    PERIOD=$(cat "/sys/fs/cgroup/cpu/$CONTAINER_GUID/cpu.cfs_period_us")
    QUOTA=$(($PERIOD * $CELL_CPU * $CPU_MAX_FACTOR * $CONTAINER_MEMORY / $CELL_MEMORY))
    QUOTA=$(($QUOTA < 1000 ? 1000 : $QUOTA))
    echo $QUOTA | sudo tee "/sys/fs/cgroup/cpu/$CONTAINER_GUID/cpu.cfs_quota_us"
fi
#################################################################################################################

Calculation example:

CPU_MAX_FACTOR   =  2
CELL_CPU         =  8
CELL_MEMORY      = 16 * 1024^3  # 16 GiB
CONTAINER_MEMORY =  4 * 1024^3  #  4 GiB
PERIOD           =  100000      # 100 ms (default)

QUOTA_RATIO = ( CONTAINER_MEMORY / CELL_MEMORY ) * CELL_CPU * CPU_MAX_FACTOR = 1/4 * 8 * 2 = 4   # i.e. 4 CPUs time at max
QUOTA = QUOTA_RATIO * PERIOD = 400000   # in ms

What do you think?

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.