diego: disk filling up over time


Tom Sherrod <tom.sherrod@...>
 

diego release 0.1398.0

After a couple of weeks of dev, the cells end up filling their disks. Did I miss a clean up job somewhere?
Currently, once pushes start failing, I get bosh to recreate the machine.

Other options?

Thanks,
Tom


Eric Malm <emalm@...>
 

Hi, Tom,

Thanks for asking about this. Could you provide some more details about
your deployment?

- What are the exact errors you're seeing when CF users are trying to make
containers? The errors from CF CLI logs or rep/garden logs would be great
to see.
- What's the total amount of disk space available on the volume attached to
/var/vcap/data? You should be able to see this from `df` command output.
- How much space is the rep configured to allocate for its executor cache?
Is it the default 10GB provided by the rep's job spec in
https://github.com/cloudfoundry-incubator/diego-release/blob/v0.1398.0/jobs/rep/spec#L70-L72?
How much disk is actually used in /var/vcap/data/executor_cache (based on
reporting from `du`, say)?
- How much space have you directed garden-linux to allocate for its btrfs
store? This is provided via the diego.garden-linux.btrfs_store_size_mb BOSH
property, and with Diego 0.1398.0 I believe it has to be specified
explicitly. Also, how much space is actually used in the btrfs filesystem?
You should be able to inspect this with the btrfs tools available on the
cell VM in '/var/vcap/packages/btrfs-tools/bin'. I think running
`/var/vcap/packages/btrfs-tools/bin/btrfs filesystem usage
/var/vcap/data/garden-linux/btrfs_graph` should be a good starting point.

You may also find some useful information in the cf-dev thread from August
about overcommitting disk on Diego cells:
https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/VBDM2TMHQSOFILSHRCV4G2CCPRBP5WKA/#VBDM2TMHQSOFILSHRCV4G2CCPRBP5WKA

Thanks,
Eric

On Wed, Nov 18, 2015 at 6:52 AM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:

diego release 0.1398.0

After a couple of weeks of dev, the cells end up filling their disks. Did
I miss a clean up job somewhere?
Currently, once pushes start failing, I get bosh to recreate the machine.

Other options?

Thanks,
Tom


Tom Sherrod <tom.sherrod@...>
 

Hi Eric,

Thank you.

I am responding below with what I have available. Unfortunately, when the
problem presents, developers are down so the current resolution is recreate
cells. Looking at one below 98% full, opportunity for additional details
may arise soon.
Answers below inline

- What are the exact errors you're seeing when CF users are trying to make
containers? The errors from CF CLI logs or rep/garden logs would be great
to see.
Did not capture detailed logs. FAILED StagingError was all that was
captured. I've asked to get more information on the next failure which may
be coming up soon, I'm looking at a cell with 98% filled. No issue reported
as of yet, of course, there are 8 cells to choose from.


- What's the total amount of disk space available on the volume attached
to /var/vcap/data? You should be able to see this from `df` command output.
/dev/vda3 22025756 20278880 604964 98% /var/vcap/data

tmpfs 1024 16 1008 2% /var/vcap/data/sys/run

/dev/loop0 122835 1552 117352 2% /tmp

/dev/loop1 20480000 17923904 1914816 91%
/var/vcap/data/garden-linux/btrfs_graph

cgroup 8216468 0 8216468 0% /tmp/garden-/cgroup
- How much space is the rep configured to allocate for its executor cache?
Is it the default 10GB provided by the rep's job spec in
https://github.com/cloudfoundry-incubator/diego-release/blob/v0.1398.0/jobs/rep/spec#L70-L72?
How much disk is actually used in /var/vcap/data/executor_cache (based on
reporting from `du`, say)?

Default (not listed in the manifest)

root(a)a0acd863-07e5-4964-8758-fcdf295d119d:/var/vcap/data/executor_cache# du

42876 .

- How much space have you directed garden-linux to allocate for its btrfs
store? This is provided via the diego.garden-linux.btrfs_store_size_mb BOSH
property, and with Diego 0.1398.0 I believe it has to be specified
explicitly. Also, how much space is actually used in the btrfs filesystem?
You should be able to inspect this with the btrfs tools available on the
cell VM in '/var/vcap/packages/btrfs-tools/bin'. I think running
`/var/vcap/packages/btrfs-tools/bin/btrfs filesystem usage
/var/vcap/data/garden-linux/btrfs_graph` should be a good starting point.
btrfs_store_size_mb: 20000

root(a)a0acd863-07e5-4964-8758-fcdf295d119d:/var/vcap/packages/btrfs-progs/bin#
./btrfs filesystem usage /var/vcap/data/garden-linux/btrfs_graph

Overall:

Device size: 19.53GiB

Device allocated: 17.79GiB

Device unallocated: 1.75GiB

Device missing: 0.00B

Used: 16.78GiB

Free (estimated): 1.83GiB (min: 976.89MiB)

Data ratio: 1.00

Metadata ratio: 2.00

Global reserve: 320.00MiB (used: 0.00B)

Data,single: Size:12.01GiB, Used:11.93GiB

/dev/loop1 12.01GiB

Metadata,single: Size:8.00MiB, Used:0.00B

/dev/loop1 8.00MiB

Metadata,DUP: Size:2.88GiB, Used:2.43GiB

/dev/loop1 5.75GiB

System,single: Size:4.00MiB, Used:0.00B

/dev/loop1 4.00MiB

System,DUP: Size:8.00MiB, Used:16.00KiB

/dev/loop1 16.00MiB

Unallocated:

/dev/loop1 1.75GiB




You may also find some useful information in the cf-dev thread from August
about overcommitting disk on Diego cells:
https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/VBDM2TMHQSOFILSHRCV4G2CCPRBP5WKA/#VBDM2TMHQSOFILSHRCV4G2CCPRBP5WKA

Thanks,
Eric



On Wed, Nov 18, 2015 at 6:52 AM, Tom Sherrod <tom.sherrod(a)gmail.com>
wrote:

diego release 0.1398.0

After a couple of weeks of dev, the cells end up filling their disks. Did
I miss a clean up job somewhere?
Currently, once pushes start failing, I get bosh to recreate the machine.

Other options?

Thanks,
Tom


Eric Malm <emalm@...>
 

Hi, Tom,

Thanks for all the diagnostic data! It looks like the actual size of the
btrfs volume (the 'Device allocated: 17.79GiB' line from the btrfs tool
output) is quite close to the total size of the ~21 GiB volume mounted on
/var/vcap/data. Since that volume also contains other files (such as the
BOSH-deployed jobs and packages and component log files) that on the cell
VM can add up to several GB, I think your cells eventually reach the point
where the sparse btrfs volume has expanded to fill all the remaining space
on the volume.

Also, from the small amount of data in the executor cache, it looks like
the cells are taking on only Docker-image workloads, rather than
buildpack-based apps. Unfortunately, with Diego 0.1398.0 and the
accompanying garden-linux version, there are a few deficiencies with disk
management for Docker-image apps. The main one that's likely exacerbating
this situation is that garden-linux doesn't clean up the docker image
layers after a Garden container based on them is destroyed. Consequently,
as you pull in more and more docker image layers over time, they use up
more and more space in the btrfs volume that's never recovered. As of
version 0.307.0, garden-linux-release does clean up those unused layers
correctly, and someone from the Garden team might recall whether that
happens in an earlier release too (maybe 0.306.0, but not earlier that
that?).

If you're currently tied to Diego 0.1398.0 to match compatibility with your
deployed CF version, the best way to manage these issues might be to
increase the size of the ephemeral disk attached to your VMs (if you're
able to), to set the garden.btrfs_store_size_mb property so that the
maximum size of the btrfs volume is 3-4 GB less than the size of that
ephemeral disk attached to /var/vcap/data, and then to monitor the disk
usage on that volume and to recreate cells when they use more than, say,
90% of that disk volume. Recreating a cell should cause it to evacuate its
instances to other cells in the deployment, so you wouldn't incur downtime
for the apps.

Best,
Eric

On Tue, Nov 24, 2015 at 7:32 AM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:

Hi Eric,

Thank you.

I am responding below with what I have available. Unfortunately, when the
problem presents, developers are down so the current resolution is recreate
cells. Looking at one below 98% full, opportunity for additional details
may arise soon.
Answers below inline

- What are the exact errors you're seeing when CF users are trying to make
containers? The errors from CF CLI logs or rep/garden logs would be great
to see.
Did not capture detailed logs. FAILED StagingError was all that was
captured. I've asked to get more information on the next failure which may
be coming up soon, I'm looking at a cell with 98% filled. No issue reported
as of yet, of course, there are 8 cells to choose from.


- What's the total amount of disk space available on the volume attached
to /var/vcap/data? You should be able to see this from `df` command output.
/dev/vda3 22025756 20278880 604964 98% /var/vcap/data

tmpfs 1024 16 1008 2% /var/vcap/data/sys/run

/dev/loop0 122835 1552 117352 2% /tmp

/dev/loop1 20480000 17923904 1914816 91%
/var/vcap/data/garden-linux/btrfs_graph

cgroup 8216468 0 8216468 0% /tmp/garden-/cgroup
- How much space is the rep configured to allocate for its executor
cache? Is it the default 10GB provided by the rep's job spec in
https://github.com/cloudfoundry-incubator/diego-release/blob/v0.1398.0/jobs/rep/spec#L70-L72?
How much disk is actually used in /var/vcap/data/executor_cache (based on
reporting from `du`, say)?

Default (not listed in the manifest)

root(a)a0acd863-07e5-4964-8758-fcdf295d119d:/var/vcap/data/executor_cache#
du

42876 .

- How much space have you directed garden-linux to allocate for its btrfs
store? This is provided via the diego.garden-linux.btrfs_store_size_mb BOSH
property, and with Diego 0.1398.0 I believe it has to be specified
explicitly. Also, how much space is actually used in the btrfs filesystem?
You should be able to inspect this with the btrfs tools available on the
cell VM in '/var/vcap/packages/btrfs-tools/bin'. I think running
`/var/vcap/packages/btrfs-tools/bin/btrfs filesystem usage
/var/vcap/data/garden-linux/btrfs_graph` should be a good starting point.
btrfs_store_size_mb: 20000

root(a)a0acd863-07e5-4964-8758-fcdf295d119d:/var/vcap/packages/btrfs-progs/bin#
./btrfs filesystem usage /var/vcap/data/garden-linux/btrfs_graph

Overall:

Device size: 19.53GiB

Device allocated: 17.79GiB

Device unallocated: 1.75GiB

Device missing: 0.00B

Used: 16.78GiB

Free (estimated): 1.83GiB (min: 976.89MiB)

Data ratio: 1.00

Metadata ratio: 2.00

Global reserve: 320.00MiB (used: 0.00B)

Data,single: Size:12.01GiB, Used:11.93GiB

/dev/loop1 12.01GiB

Metadata,single: Size:8.00MiB, Used:0.00B

/dev/loop1 8.00MiB

Metadata,DUP: Size:2.88GiB, Used:2.43GiB

/dev/loop1 5.75GiB

System,single: Size:4.00MiB, Used:0.00B

/dev/loop1 4.00MiB

System,DUP: Size:8.00MiB, Used:16.00KiB

/dev/loop1 16.00MiB

Unallocated:

/dev/loop1 1.75GiB




You may also find some useful information in the cf-dev thread from
August about overcommitting disk on Diego cells:
https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/VBDM2TMHQSOFILSHRCV4G2CCPRBP5WKA/#VBDM2TMHQSOFILSHRCV4G2CCPRBP5WKA

Thanks,
Eric



On Wed, Nov 18, 2015 at 6:52 AM, Tom Sherrod <tom.sherrod(a)gmail.com>
wrote:

diego release 0.1398.0

After a couple of weeks of dev, the cells end up filling their disks.
Did I miss a clean up job somewhere?
Currently, once pushes start failing, I get bosh to recreate the machine.

Other options?

Thanks,
Tom