Overcommit on Diego Cells


Mike Youngstrom <youngm@...>
 

Today my org manages our DEA resources using a heavy overcommit strategy.
Rather than being conservative and ensuring that none of our DEAs commit to
more than they can handle we have instead decided to overcommit to the
point where we basically turn off DEA resource management.

All our DEAs have the same amount of RAM and Disk and we closely monitor
these resources. When load gets beyond a threshold we deploy more DEAs.
We use Org quotas as ceilings to help stop an app from accidentally killing
everything.

So far this strategy has worked out great for us. It's allowed us to
provide much more friendly defaults for RAM and Disk and allowed us to get
more value out of our DEA dollar.

As we move into Diego we're attempting to implement the same strategy. We
want to be sure to do it correctly since we're less comfortable with Diego
at this point.

Diego doesn't have the friendly "overcommit" property DEAs do. Instead I
see "diego.executor.memory_capacity_mb" and
"diego.executor.disk_capacity_mb". Can I overcommit these values and get
the same behaviour I would overcommitting DEAs?

I'd also like some advice on what "diego.garden-linux.btrfs_store_size_mb"
is and how it might apply to my overcommit plans.

Thanks,
Mike


James Bayer
 

i know that onsi and eric have discussed this. i've heard that eric is
working on a reply.

On Tue, Aug 11, 2015 at 12:50 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Today my org manages our DEA resources using a heavy overcommit strategy.
Rather than being conservative and ensuring that none of our DEAs commit to
more than they can handle we have instead decided to overcommit to the
point where we basically turn off DEA resource management.

All our DEAs have the same amount of RAM and Disk and we closely monitor
these resources. When load gets beyond a threshold we deploy more DEAs.
We use Org quotas as ceilings to help stop an app from accidentally killing
everything.

So far this strategy has worked out great for us. It's allowed us to
provide much more friendly defaults for RAM and Disk and allowed us to get
more value out of our DEA dollar.

As we move into Diego we're attempting to implement the same strategy. We
want to be sure to do it correctly since we're less comfortable with Diego
at this point.

Diego doesn't have the friendly "overcommit" property DEAs do. Instead I
see "diego.executor.memory_capacity_mb" and
"diego.executor.disk_capacity_mb". Can I overcommit these values and get
the same behaviour I would overcommitting DEAs?

I'd also like some advice on what "diego.garden-linux.btrfs_store_size_mb"
is and how it might apply to my overcommit plans.

Thanks,
Mike
--
Thank you,

James Bayer


Eric Malm <emalm@...>
 

Hi, Mike,

Apologies, I emailed this to cf-dev a few days ago, but it seems not to have gone through. Anyway, thanks for asking about the different configuration values Diego exposes for disk and memory. Yes, you can use the 'diego.executor.memory_capacity_mb' and 'diego.executor.disk_capacity_mb' properties to specify overcommits in absolute terms rather than the relative factors configurable on the DEAs. The cell reps will advertise those values as their maximum memory and disk capacity, and subtract memory and disk for allocated containers when reporting their available capacity during auctions.

The 'btrfs_store_size_mb' property on garden-linux is more of a moving target as garden-linux settles in on that filesystem as a backing store. As of garden-linux-release 0.292.0, which diego-release 0.1412.0 and later consume, that property accepts a '-1' value that allows it to grow up to the full size of the available disk on the /var/vcap/data ephemeral disk volume. The btrfs volume itself is sparse, so it will start at effectively zero size and grow as needed to accommodate the container layers. Since you're already monitoring disk usage on your VMs carefully and scaling out when you hit certain limits, this might be a good option for you. This is also effectively how the DEAs operate today, without an explicit limit on the total amount of disk they allocate for containers.

If you do want more certainty in the maximum size that the garden-linux btrfs volume will grow to, or if you're on a version of diego-release earlier than 0.1412.0, you should set btrfs_store_size_mb to a positive value, and garden-linux will create the volume to grow only up to that size. One strategy to determine that value would be to use the maximum size of the ephemeral disk, less the size of the BOSH-deployed packages (for the executor, currently around 1.3 GB, including the untarred cflinuxfs2 rootfs), less the size allocated to the executor cache in the 'diego.executor.max_cache_size_in_bytes' property (which currently defaults to 10GB).

Best,
Eric


Mike Youngstrom <youngm@...>
 

Thanks for the response Eric. It was very helpful.

One last question. Any thoughts on what would be the best way to monitor
free ephemeral disk space in my overcommitted situation? If using
btrfs_store_size_mb=-1 will btrfs free ephemeral disk space when less is
being used or does it just grow when it needs more? Looking at firehose
stats in 1398 I don't see any btrfs usage metrics being sent from
garden-linux.

Thanks,
Mike

On Mon, Aug 17, 2015 at 9:14 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Mike,

Apologies, I emailed this to cf-dev a few days ago, but it seems not to
have gone through. Anyway, thanks for asking about the different
configuration values Diego exposes for disk and memory. Yes, you can use
the 'diego.executor.memory_capacity_mb' and
'diego.executor.disk_capacity_mb' properties to specify overcommits in
absolute terms rather than the relative factors configurable on the DEAs.
The cell reps will advertise those values as their maximum memory and disk
capacity, and subtract memory and disk for allocated containers when
reporting their available capacity during auctions.

The 'btrfs_store_size_mb' property on garden-linux is more of a moving
target as garden-linux settles in on that filesystem as a backing store. As
of garden-linux-release 0.292.0, which diego-release 0.1412.0 and later
consume, that property accepts a '-1' value that allows it to grow up to
the full size of the available disk on the /var/vcap/data ephemeral disk
volume. The btrfs volume itself is sparse, so it will start at effectively
zero size and grow as needed to accommodate the container layers. Since
you're already monitoring disk usage on your VMs carefully and scaling out
when you hit certain limits, this might be a good option for you. This is
also effectively how the DEAs operate today, without an explicit limit on
the total amount of disk they allocate for containers.

If you do want more certainty in the maximum size that the garden-linux
btrfs volume will grow to, or if you're on a version of diego-release
earlier than 0.1412.0, you should set btrfs_store_size_mb to a positive
value, and garden-linux will create the volume to grow only up to that
size. One strategy to determine that value would be to use the maximum size
of the ephemeral disk, less the size of the BOSH-deployed packages (for the
executor, currently around 1.3 GB, including the untarred cflinuxfs2
rootfs), less the size allocated to the executor cache in the
'diego.executor.max_cache_size_in_bytes' property (which currently defaults
to 10GB).

Best,
Eric


Will Pragnell <wpragnell@...>
 

Apparently my last reply to this thread never made it through. Hope this
one does!

Mike, you're right that there are currently no btrfs metrics being emitted
from garden-linux. There are currently no immediate plans to implement
this, but clearly such metrics are useful, so I'll raise this with the team
and see where we land.

As for your question about btrfs freeing disk space, I'm afraid I don't
know off hand. I'll have to do some investigation and get back to you on
that next week.

On 19 August 2015 at 23:46, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks for the response Eric. It was very helpful.

One last question. Any thoughts on what would be the best way to monitor
free ephemeral disk space in my overcommitted situation? If using
btrfs_store_size_mb=-1 will btrfs free ephemeral disk space when less is
being used or does it just grow when it needs more? Looking at firehose
stats in 1398 I don't see any btrfs usage metrics being sent from
garden-linux.

Thanks,
Mike

On Mon, Aug 17, 2015 at 9:14 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Mike,

Apologies, I emailed this to cf-dev a few days ago, but it seems not to
have gone through. Anyway, thanks for asking about the different
configuration values Diego exposes for disk and memory. Yes, you can use
the 'diego.executor.memory_capacity_mb' and
'diego.executor.disk_capacity_mb' properties to specify overcommits in
absolute terms rather than the relative factors configurable on the DEAs.
The cell reps will advertise those values as their maximum memory and disk
capacity, and subtract memory and disk for allocated containers when
reporting their available capacity during auctions.

The 'btrfs_store_size_mb' property on garden-linux is more of a moving
target as garden-linux settles in on that filesystem as a backing store. As
of garden-linux-release 0.292.0, which diego-release 0.1412.0 and later
consume, that property accepts a '-1' value that allows it to grow up to
the full size of the available disk on the /var/vcap/data ephemeral disk
volume. The btrfs volume itself is sparse, so it will start at effectively
zero size and grow as needed to accommodate the container layers. Since
you're already monitoring disk usage on your VMs carefully and scaling out
when you hit certain limits, this might be a good option for you. This is
also effectively how the DEAs operate today, without an explicit limit on
the total amount of disk they allocate for containers.

If you do want more certainty in the maximum size that the garden-linux
btrfs volume will grow to, or if you're on a version of diego-release
earlier than 0.1412.0, you should set btrfs_store_size_mb to a positive
value, and garden-linux will create the volume to grow only up to that
size. One strategy to determine that value would be to use the maximum size
of the ephemeral disk, less the size of the BOSH-deployed packages (for the
executor, currently around 1.3 GB, including the untarred cflinuxfs2
rootfs), less the size allocated to the executor cache in the
'diego.executor.max_cache_size_in_bytes' property (which currently defaults
to 10GB).

Best,
Eric


Mike Youngstrom <youngm@...>
 

Thanks Will.

If btrfs does free disk space then I can just use the bosh ephemeral disk
metric to monitor. If it doesn't then I'll need Garden to provide me with
something.

Thanks,
Mike

On Thu, Aug 20, 2015 at 10:58 AM, Will Pragnell <wpragnell(a)pivotal.io>
wrote:

Apparently my last reply to this thread never made it through. Hope this
one does!

Mike, you're right that there are currently no btrfs metrics being emitted
from garden-linux. There are currently no immediate plans to implement
this, but clearly such metrics are useful, so I'll raise this with the team
and see where we land.

As for your question about btrfs freeing disk space, I'm afraid I don't
know off hand. I'll have to do some investigation and get back to you on
that next week.

On 19 August 2015 at 23:46, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks for the response Eric. It was very helpful.

One last question. Any thoughts on what would be the best way to monitor
free ephemeral disk space in my overcommitted situation? If using
btrfs_store_size_mb=-1 will btrfs free ephemeral disk space when less is
being used or does it just grow when it needs more? Looking at firehose
stats in 1398 I don't see any btrfs usage metrics being sent from
garden-linux.

Thanks,
Mike

On Mon, Aug 17, 2015 at 9:14 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Mike,

Apologies, I emailed this to cf-dev a few days ago, but it seems not to
have gone through. Anyway, thanks for asking about the different
configuration values Diego exposes for disk and memory. Yes, you can use
the 'diego.executor.memory_capacity_mb' and
'diego.executor.disk_capacity_mb' properties to specify overcommits in
absolute terms rather than the relative factors configurable on the DEAs.
The cell reps will advertise those values as their maximum memory and disk
capacity, and subtract memory and disk for allocated containers when
reporting their available capacity during auctions.

The 'btrfs_store_size_mb' property on garden-linux is more of a moving
target as garden-linux settles in on that filesystem as a backing store. As
of garden-linux-release 0.292.0, which diego-release 0.1412.0 and later
consume, that property accepts a '-1' value that allows it to grow up to
the full size of the available disk on the /var/vcap/data ephemeral disk
volume. The btrfs volume itself is sparse, so it will start at effectively
zero size and grow as needed to accommodate the container layers. Since
you're already monitoring disk usage on your VMs carefully and scaling out
when you hit certain limits, this might be a good option for you. This is
also effectively how the DEAs operate today, without an explicit limit on
the total amount of disk they allocate for containers.

If you do want more certainty in the maximum size that the garden-linux
btrfs volume will grow to, or if you're on a version of diego-release
earlier than 0.1412.0, you should set btrfs_store_size_mb to a positive
value, and garden-linux will create the volume to grow only up to that
size. One strategy to determine that value would be to use the maximum size
of the ephemeral disk, less the size of the BOSH-deployed packages (for the
executor, currently around 1.3 GB, including the untarred cflinuxfs2
rootfs), less the size allocated to the executor cache in the
'diego.executor.max_cache_size_in_bytes' property (which currently defaults
to 10GB).

Best,
Eric