Date   

Re: Recipe to install Diego?

Eric Malm <emalm@...>
 

Hi, Tom,

The Diego team does deploy Diego to AWS as part of our testing pipeline. We
haven't fully published our tooling for doing so, but you can see some of
our process in the deploy_diego CI script in diego-release
<https://github.com/cloudfoundry-incubator/diego-release/blob/develop/scripts/ci/deploy_diego>,
which uses diego-release's generate-deployment-manifest script. This script
is set up differently from the generate_deployment_manifest script in
cf-release, in that it takes a fixed sequence of stubs and a deployment
directory as arguments instead of an infrastructure type and an arbitrary
list of stubs to merge in. The full list of stubs is described in the usage
message for the script, but here are the parts that should be most relevant
for you to deploy Diego to AWS or OpenStack:

- IaaS settings (arg #5): This is a stub that should contain an
"iaas_settings" hash with several expected subfields
(compilation_cloud_properties, resource_pool_cloud_properties,
stemcell, subnet_configs). The manifest generation script takes these
values and uses them to populate certain fields in the diego manifest's
resource_pools, networks, and compilation sections. This will likely be the
stub you need to customize the most for an AWS or OpenStack deployment, as
this will contain all the information about the network and security group
configuration for that environment.
- Deployments directory (arg #7): This is a directory that should contain
your CF deployment manifest as the file 'cf.yml'. The manifest generation
script will extract certain values from the CF manifest so the Diego
deployment can integrate correctly with various services in CF (for
example, NATS and consul).
- Director UUID (arg #1): This is a stub containing "director_uuid:
<your-director-uuid>"; you may already have such a stub for generating your
CF manifest.
- Instance count overrides (arg #3): This is a stub containing any
instance-count changes for the diego jobs. Depending on the size of your
desired cluster, you'll want to change these values from the defaults that
the manifest-generation/diego.yml template provides in the jobs section.

Depending on how you wish to configure the Diego deployment, there may be
some additional properties you want to add to the property-overrides stub
(arg #2). I doubt you'll need to change anything in the persistent-disk
overrides or additional-jobs stubs (args #4 and #6), unless you're
customizing your deployment extensively. In any case, the stubs under
manifest-generation/bosh-lite-stubs should give you examples to customize
for your own deployment, and the manifest-generation/diego.yml template
will show you which values from those stubs are consumed in manifest
generation.

Also, as Diego matures and becomes the principal backend for running
application instances in CF, these manifest-generation patterns may change
substantially.

Thanks,
Eric Malm, CF Runtime Diego PM

On Tue, May 12, 2015 at 8:48 AM, Ken Ojiri <ozzozz(a)gmail.com> wrote:

Hi,

I use spiff manifest templates included by cf-release and diego-release,
and generate manifests by spiff, but I usually use the manifests as
reference materials.
I finally adjust my own manifests by refering to spiff generated manifests,
job definitions of cf-release and/or diego-release, and do try-and-error...

Now, setting parameters of diego components are changing with every
version,
so job definitions of diego-release are essential reference.

Regards,
Ken Ojiri


---
Ken Ojiri <ozzozz(a)gmail.com>
Mitaka, Tokyo Japan


On Tue, May 12, 2015 at 5:56 PM, 王天青 <wang.tianqing.cn(a)gmail.com> wrote:
Hi Ken,

How do you generate the manifest file?

Thanks
Best Regards~!
Grissom

On Mon, May 11, 2015 at 9:17 PM OzzOzz <ozzozz(a)gmail.com> wrote:

Hi,

I have posted a sample BOSH deployment manifest to Gist.
https://gist.github.com/ozzozz/4c08c37863b703a75afc
I could deploy cf-release v207 and diego-release 0.1099.0 to AWS Tokyo
region by MicroBOSH.

I could also deploy cf-release and diego-release to OpenStack(Juno).
The manifests differs only in 'networks', 'cloud_properties' and
'stemcell'.

Regards,
Ken

---
<ozzozz(a)gmail.com>
Mitaka, Tokyo Japan


On Sat, May 9, 2015 at 8:57 PM, Tom Sherrod <tom.sherrod(a)gmail.com>
wrote:
Hi,

Are there any examples or docs on installing Diego with
bosh/microbosh?
Using the bosh-lite as a template, I'm tripping up on various parts.
Is
this
even a valid direction in installing?
Either AWS or Openstack..

Thanks,
Tom

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: Purge files on NFS or S3?

Jon Price
 

Make sure you only delete the resource files, not everything...

Jon Price
Intel Corp.

On May 11, 2015 10:05 PM, Dieu Cao <dcao(a)pivotal.io> wrote:
An option could be to just delete all the resource files on the blobstore. The effect would be that for binaries that would have been matched, they would be uploaded again on the first new push including those binaries.

On Monday, May 11, 2015, John Wong <gokoproject(a)gmail.com<mailto:gokoproject(a)gmail.com>> wrote:
Hi all

Thanks. No I was just curious if there was a way to identify what to remove in the blobstore because I was surprised the size of my blobstore at this point. I will check what's in there (maybe James is right it is mostly resource files). I am currently using NFS. I can build a CF with S3 as my blobstore.

John


On Mon, May 11, 2015 at 11:36 AM, Chad Woolley <thewoolleyman(a)gmail.com> wrote:
Not sure if this is what you need, but you can manually sync + delete files from a local filesystem (including NFS mount) to/from S3:

http://s3tools.org/s3cmd-sync

... with `—delete-removed` option

-- Chad


On Sat, May 9, 2015 at 12:19 AM, James Bayer <jbayer(a)pivotal.io> wrote:

john, i think the resource files may grow forever right now without intervention.

i'm pretty confident that when apps are deleted that their droplets are deleted with them and that proper garbage collection occurs with that.

i'm unaware of any NFS file system to s3 blob migration. you would need to update the CC_DB references too i'm pretty sure. i'm interested if you find out more.

On Tue, May 5, 2015 at 1:14 PM, John Wong <gokoproject(a)gmail.com> wrote:

Hi

I just looked at our disk usage on NFS server. We have used like 200G so far, and I wonder if there's a systematic way to purge files we don't need (or how do I know I don't need them)?

Similarly, if I were to replace NFS server with S3 instead, does the existing process (if any) work with S3?

Thanks.

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: Recipe to install Diego?

Ken Ojiri
 

Hi,

I use spiff manifest templates included by cf-release and diego-release,
and generate manifests by spiff, but I usually use the manifests as
reference materials.
I finally adjust my own manifests by refering to spiff generated manifests,
job definitions of cf-release and/or diego-release, and do try-and-error...

Now, setting parameters of diego components are changing with every version,
so job definitions of diego-release are essential reference.

Regards,
Ken Ojiri


---
Ken Ojiri <ozzozz(a)gmail.com>
Mitaka, Tokyo Japan

On Tue, May 12, 2015 at 5:56 PM, 王天青 <wang.tianqing.cn(a)gmail.com> wrote:
Hi Ken,

How do you generate the manifest file?

Thanks
Best Regards~!
Grissom

On Mon, May 11, 2015 at 9:17 PM OzzOzz <ozzozz(a)gmail.com> wrote:

Hi,

I have posted a sample BOSH deployment manifest to Gist.
https://gist.github.com/ozzozz/4c08c37863b703a75afc
I could deploy cf-release v207 and diego-release 0.1099.0 to AWS Tokyo
region by MicroBOSH.

I could also deploy cf-release and diego-release to OpenStack(Juno).
The manifests differs only in 'networks', 'cloud_properties' and
'stemcell'.

Regards,
Ken

---
<ozzozz(a)gmail.com>
Mitaka, Tokyo Japan


On Sat, May 9, 2015 at 8:57 PM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:
Hi,

Are there any examples or docs on installing Diego with bosh/microbosh?
Using the bosh-lite as a template, I'm tripping up on various parts. Is
this
even a valid direction in installing?
Either AWS or Openstack..

Thanks,
Tom

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Scaling Java Application

Christopher Frost
 

When deploying a Java application to Cloud Foundry the Java memory settings
for the application are decided based on the configured memory weighting
during staging. This means that, unlike other apps, if the application is
scaled to give it more memory it needs to be *restage*d it to get updated
Java memory settings. This has now been improved with an improved memory
calculator written by Steve Powell[2]. The Memory Calculator[1] will be run
during every application start to ensure the application gets up-to-date
memory settings, its output is shown during staging.

-----> Downloading Open JDK Like Memory Calculator 1.1.1_RELEASE from
https://download.run.pivotal.io/memory-calculator/trusty/x86_64/memory-calculator-1.1.1_RELEASE
(found in cache)
Memory Settings: -XX:MaxMetaspaceSize=64M -XX:MetaspaceSize=64M
-Xss995K -Xmx382293K -Xms382293K

Then scaling the application to double the memory will result in new memory
settings without having to restage the application.

cf scale my-application -m 1G

-Xmx768M -Xms768M -XX:MaxMetaspaceSize=104857K -XX:MetaspaceSize=104857K
-Xss1M

This new feature is currently available on the master branch of the
buildpack [3] and will be released in due course.


Chris.

[1] https://github.com/cloudfoundry/java-buildpack-memory-calculator
[2] https://github.com/Zteve
[3] https://github.com/cloudfoundry/java-buildpack

--
Christopher Frost - GoPivotal UK


Scailing Java Applications

Christopher Frost
 

When deploying a Java application to Cloud Foundry the Java memory settings
for the application are decided based on the configured memory weighting
during staging. This means that, unlike other apps, if the application is
scaled to give it more memory it needs to be *restage*d it to get updated
Java memory settings. This has now been improved with an improved memory
calculator written by Steve Powell[2]. The Memory Calculator[1] will be run
during every application start to ensure the application gets up-to-date
memory settings, its output is shown during staging.

-----> Downloading Open JDK Like Memory Calculator 1.1.1_RELEASE from
https://download.run.pivotal.io/memory-calculator/trusty/x86_64/memory-calculator-1.1.1_RELEASE
(found in cache)
Memory Settings: -XX:MaxMetaspaceSize=64M -XX:MetaspaceSize=64M
-Xss995K -Xmx382293K -Xms382293K

Then scaling the application to double the memory will result in new memory
settings without having to restage the application.

cf scale my-application -m 1G

-Xmx768M -Xms768M -XX:MaxMetaspaceSize=104857K -XX:MetaspaceSize=104857K
-Xss1M


This new feature is currently available on the master branch of the
buildpack [3] and will be released in due course.


Chris.

[1] https://github.com/cloudfoundry/java-buildpack-memory-calculator
[2] https://github.com/Zteve
[3] https://github.com/cloudfoundry/java-buildpack

--
Christopher Frost - Pivotal UK


Follow up on multiple line log outputs in CF

George Li
 

Hi,

this is a follow up on the archived posting
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ.
I cannot find any new postings on that thread.
I am using Cloud Foundry version "6.11.2-2a26d55-2015-04-27T21:11:44+00:00"
and want to know what options I have to handle multiple line logs in a
multi-tenant environment. Since multiple instances of multiple applications
are all sending logs to a single Logstash server, is it best to avoid
having multiple lines in my log? I can live with sticking to single line
logs except for outputting exception stack trace, not to mention that I
only have control over my code.

Thanks.


Code license question

peteb@...
 

Hello,

I am a software developer and was wondering what is the code license for your CloudFoundry Community Code, such as: the go cfc client: https://github.com/cloudfoundry-community/go-cfclient ?

Thanks,
kind regards,
Piotr


Re: Recipe to install Diego?

王天青 <wang.tianqing.cn at gmail.com...>
 

Hi Ken,

How do you generate the manifest file?

Thanks
Best Regards~!
Grissom

On Mon, May 11, 2015 at 9:17 PM OzzOzz <ozzozz(a)gmail.com> wrote:

Hi,

I have posted a sample BOSH deployment manifest to Gist.
https://gist.github.com/ozzozz/4c08c37863b703a75afc
I could deploy cf-release v207 and diego-release 0.1099.0 to AWS Tokyo
region by MicroBOSH.

I could also deploy cf-release and diego-release to OpenStack(Juno).
The manifests differs only in 'networks', 'cloud_properties' and
'stemcell'.

Regards,
Ken

---
<ozzozz(a)gmail.com>
Mitaka, Tokyo Japan


On Sat, May 9, 2015 at 8:57 PM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:
Hi,

Are there any examples or docs on installing Diego with bosh/microbosh?
Using the bosh-lite as a template, I'm tripping up on various parts. Is
this
even a valid direction in installing?
Either AWS or Openstack..

Thanks,
Tom

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: Purge files on NFS or S3?

Dieu Cao <dcao@...>
 

An option could be to just delete all the resource files on the blobstore.
The effect would be that for binaries that would have been matched, they
would be uploaded again on the first new push including those binaries.

On Monday, May 11, 2015, John Wong <gokoproject(a)gmail.com> wrote:

Hi all

Thanks. No I was just curious if there was a way to identify what to
remove in the blobstore because I was surprised the size of my blobstore at
this point. I will check what's in there (maybe James is right it is mostly
resource files). I am currently using NFS. I can build a CF with S3 as my
blobstore.

John


On Mon, May 11, 2015 at 11:36 AM, Chad Woolley <thewoolleyman(a)gmail.com
<javascript:_e(%7B%7D,'cvml','thewoolleyman(a)gmail.com');>> wrote:

Not sure if this is what you need, but you can manually sync + delete
files from a local filesystem (including NFS mount) to/from S3:

http://s3tools.org/s3cmd-sync

... with `—delete-removed` option

-- Chad


On Sat, May 9, 2015 at 12:19 AM, James Bayer <jbayer(a)pivotal.io
<javascript:_e(%7B%7D,'cvml','jbayer(a)pivotal.io');>> wrote:

john, i think the resource files may grow forever right now without
intervention.

i'm pretty confident that when apps are deleted that their droplets are
deleted with them and that proper garbage collection occurs with that.

i'm unaware of any NFS file system to s3 blob migration. you would need
to update the CC_DB references too i'm pretty sure. i'm interested if you
find out more.

On Tue, May 5, 2015 at 1:14 PM, John Wong <gokoproject(a)gmail.com
<javascript:_e(%7B%7D,'cvml','gokoproject(a)gmail.com');>> wrote:

Hi

I just looked at our disk usage on NFS server. We have used like 200G
so far, and I wonder if there's a systematic way to purge files we don't
need (or how do I know I don't need them)?

Similarly, if I were to replace NFS server with S3 instead, does the
existing process (if any) work with S3?

Thanks.

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<javascript:_e(%7B%7D,'cvml','cf-dev(a)lists.cloudfoundry.org');>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<javascript:_e(%7B%7D,'cvml','cf-dev(a)lists.cloudfoundry.org');>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<javascript:_e(%7B%7D,'cvml','cf-dev(a)lists.cloudfoundry.org');>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: Purge files on NFS or S3?

John Wong
 

Hi all

Thanks. No I was just curious if there was a way to identify what to remove
in the blobstore because I was surprised the size of my blobstore at this
point. I will check what's in there (maybe James is right it is mostly
resource files). I am currently using NFS. I can build a CF with S3 as my
blobstore.

John


On Mon, May 11, 2015 at 11:36 AM, Chad Woolley <thewoolleyman(a)gmail.com>
wrote:

Not sure if this is what you need, but you can manually sync + delete
files from a local filesystem (including NFS mount) to/from S3:

http://s3tools.org/s3cmd-sync

... with `—delete-removed` option

-- Chad


On Sat, May 9, 2015 at 12:19 AM, James Bayer <jbayer(a)pivotal.io> wrote:

john, i think the resource files may grow forever right now without
intervention.

i'm pretty confident that when apps are deleted that their droplets are
deleted with them and that proper garbage collection occurs with that.

i'm unaware of any NFS file system to s3 blob migration. you would need
to update the CC_DB references too i'm pretty sure. i'm interested if you
find out more.

On Tue, May 5, 2015 at 1:14 PM, John Wong <gokoproject(a)gmail.com> wrote:

Hi

I just looked at our disk usage on NFS server. We have used like 200G
so far, and I wonder if there's a systematic way to purge files we don't
need (or how do I know I don't need them)?

Similarly, if I were to replace NFS server with S3 instead, does the
existing process (if any) work with S3?

Thanks.

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: Purge files on NFS or S3?

Chad Woolley <thewoolleyman@...>
 

Not sure if this is what you need, but you can manually sync + delete files
from a local filesystem (including NFS mount) to/from S3:

http://s3tools.org/s3cmd-sync

... with `—delete-removed` option

-- Chad

On Sat, May 9, 2015 at 12:19 AM, James Bayer <jbayer(a)pivotal.io> wrote:

john, i think the resource files may grow forever right now without
intervention.

i'm pretty confident that when apps are deleted that their droplets are
deleted with them and that proper garbage collection occurs with that.

i'm unaware of any NFS file system to s3 blob migration. you would need
to update the CC_DB references too i'm pretty sure. i'm interested if you
find out more.

On Tue, May 5, 2015 at 1:14 PM, John Wong <gokoproject(a)gmail.com> wrote:

Hi

I just looked at our disk usage on NFS server. We have used like 200G so
far, and I wonder if there's a systematic way to purge files we don't need
(or how do I know I don't need them)?

Similarly, if I were to replace NFS server with S3 instead, does the
existing process (if any) work with S3?

Thanks.

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: [vcap-dev] Java OOM debugging

Lari Hotari <Lari@...>
 

fyi. Tomcat 8.0.20 might be consuming more memory than 8.0.18:
https://github.com/cloudfoundry/java-buildpack/issues/166#issuecomment-94517568

Other things we’ve tried:

- We set verbose garbage collection to verify there was no
memory size issues within the JVM. There wasn’t.

- We tried setting minimum memory for native, it had no
effect. The container still gets killed

- We tried adjusting the ‘memory heuristics’ so that they
added up to 80 rather than 100. This had the effect of causing a delay
in the container being killed. However it still was killed.
I think adjusting memory heuristics so that they add up to 80 doesn't
make a difference because the values aren't percentages.
The values are proportional weighting values used in the memory
calculation:
https://github.com/grails-samples/java-buildpack/blob/b4abf89/docs/jre-oracle_jre.md#memory-calculation

I found out that the only way to reserve "unused" memory is to set a
high value for the native memory lower bound in the memory_sizes.native
setting of config/open_jdk_jre.yml .
Example:
https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25



This seems like classic memory leak behaviour to me.
In my case it wasn't a classical Java memory leak, since the Java
application wasn't leaking memory. I was able to confirm this by getting
some heap dumps with the HeapDumpServlet
(https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy)
and analyzing them.

In my case the JVM's RSS memory size is slowly growing. It probably is
some kind of memory leak since one process I've been monitoring now is
very close to the memory limit. The uptime is now almost 3 weeks.

Here is the latest diff of the meminfo report.
https://gist.github.com/lhotari/ee77decc2585f56cf3ad#file-meminfo_diff_example2-txt

From a Java perspective this isn't classical. The JVM heap isn't filling
up. The problem is that RSS size is slowly growing and will eventually
cause the Java process to cross the memory boundary so that the process
gets kill by the Linux kernel cgroups OOM killer.

RSS size might be growing because of many reasons. I have been able to
slow down the growth by doing the various MALLOC_ and JVM parameter
tuning (-XX:MinMetaspaceExpansion=1M -XX:CodeCacheExpansionSize=1M). I'm
able to get a longer uptime, but the problem isn't solved.

Lari


On 15-05-11 06:41 AM, Head-Rapson, David wrote:

Thanks for the continued advice.



We’ve hit on a key discovery after yet another a soak test this weekend.

- When we deploy using Tomcat 8.0.18 we don’t see the issue

- When we deploy using Tomcat 8.0.20 (same app version, same
CF space, same services bound, same JBP code version, same JRE
version, running at the same time), we see the crashes occurring after
just a couple of hours.



Ideally we’d go ahead with the memory calculations you mentioned
however we’re stuck on lucid64 because we’re using Pivotal CF 1.3.x &
we’re having upgrade issues to 1.4.x.

So we’re not able to adjust MALLOC_ARENA_MAX, nor are we able to view
RSS in pmap as you describe



Other things we’ve tried:

- We set verbose garbage collection to verify there was no
memory size issues within the JVM. There wasn’t.

- We tried setting minimum memory for native, it had no
effect. The container still gets killed

- We tried adjusting the ‘memory heuristics’ so that they
added up to 80 rather than 100. This had the effect of causing a delay
in the container being killed. However it still was killed.



This seems like classic memory leak behaviour to me.



*From:*Lari Hotari [mailto:lari.hotari(a)sagire.fi] *On Behalf Of *Lari
Hotari
*Sent:* 08 May 2015 16:25
*To:* Daniel Jones; Head-Rapson, David
*Cc:* cf-dev(a)lists.cloudfoundry.org
*Subject:* Re: [Cf-dev] [vcap-dev] Java OOM debugging




For my case, it turned out to be essential to reserve enough memory
for "native" in the JBP. For the 2GB total memory, I set the minimum
to 330M. With that setting I have been able to get over 2 weeks up
time by now.

I mentioned this in my previous email:

The workaround for that in my case was to add a native key under
memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is
for a 2GB total memory).
see example
https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory
bounds. I'm sure there is now also a way to get the keys without
forking the buildpack. I could have also adjusted the percentage
portions, but I wanted to set a hard minimum for this case.


I've been trying to get some insight by diffing the reports gathered
from the meminfo servlet
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy


Here is such an example of a diff:
https://gist.github.com/lhotari/ee77decc2585f56cf3ad#file-meminfo_diff_example-txt

meminfo has pmap output included to get the report of the memory map
of the process. I have just noticed that most of the memory has
already been mmap:ed from the OS and it's just growing in RSS size.
For example:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
The pmap output from lucid64 didn't include the RSS size, so you have
to use cflinuxfs2 for this. It's also better because of other reasons.
The glibc in lucid64 is old and has some bugs around the MALLOC_ARENA_MAX.

I was manually able to estimate the maximum size of the RSS size of
what the Java process will consume by simply picking the large
anon-blocks from the pmap report and calculating those blocks by the
allocated virtual size (VSS).
Based on this calculation, I picked the minimum of 330M for "native"
in open_jdk_jre.yml as I mentioned before.

It looks like these rows are for the Heap size:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
It looks like the JVM doesn't fully allocate that block in RSS
initially and most of the growth of RSS size comes from that in my
case. In your case, it might be something different.

I also added a servlet for getting glibc malloc_info statistics in XML
format (). I haven't really analysed that information because of time
constraints and because I don't have a pressing problem any more. btw.
The malloc_info XML report is missing some key elements, that has been
added in later glibc versions
(https://github.com/bminor/glibc/commit/4d653a59ffeae0f46f76a40230e2cfa9587b7e7e).

If killjava.sh never fires and the app crashed with Warden out of
memory errors, then I believe it's the kernel's cgroups OOM killer
that has killed the container processes. I have found this location
where Warden oom notifier gets the OOM notification event:
https://github.com/cloudfoundry/warden/blob/ad18bff/warden/lib/warden/container/features/mem_limit.rb#L70
This is the oom.c source code:
https://github.com/cloudfoundry/warden/blob/ad18bff7dc56acbc55ff10bcc6045ebdf0b20c97/warden/src/oom/oom.c
. It reads the cgroups control files and receives events from the
kernel that way.

I'd suggest that you use pmap for the Java process after it has
started and calculate the maximum RSS size by calculating the VSS size
of the large anon blocks instead of RSS for the blocks that the Java
process has reserved for it's different memory areas (I think you
shouldn't . You should discard adding VSS for the
CompressedClassSpaceSize block.
After this calculation, add enough memory to the "native" parameter in
JBP until the RSS size calculated this way stays under the limit.
That's the only "method" I have come up by now.

It might be required to have some RSS space allocated for any zip/jar
files read by the Java process. I think that Java uses mmap files for
zip file reading by default and that might go on top of all other limits.
To test this theory, I'd suggest testing by adding
-Dsun.zip.disableMemoryMapping=true system property setting to
JAVA_OPTS. That disables the native mmap for zip/jar file reading. I
haven't had time to test this assumption.

I guess the only way to understand how Java allocates memory is to
look at the source code.
from http://openjdk.java.net/projects/jdk8u/ , the instructions to get
the source code of JDK 8:
hg clone http://hg.openjdk.java.net/jdk8u/jdk8u;cd jdk8u;sh get_source.sh
This tool is really good for grepping and searching the source code:
http://geoff.greer.fm/ag/ <http://geoff.greer.fm/ag/>
On Ubuntu it's in silversearcher-ag package, "apt-get install
silversearcher-ag" and on MacOSX brew it's "brew install
the_silver_searcher".
This alias is pretty useful:
alias codegrep='ag --color --group --pager less -C 5'
Then you just search for the correct location in code by starting with
the tokens you know about:
codegrep MaxMetaspaceSize
this gives pretty good starting points in looking how the JDK
allocates memory.

So the JDK source code is only a few commands away.

It would be interesting to hear more about this if someone has the
time to dig in to this. This is about how far I got and I hope sharing
this information helps someone continue. :)


Lari
github/twitter: lhotari

On 15-05-08 10:02 AM, Daniel Jones wrote:

Hi Lari et al,



Thanks for your help Lari.



David and I are pairing on this issue, and we're yet to resolve
it. We're in the process of creating a repeatable test case (our
most crashy app makes calls to external services that need
mocking), but in the meantime, here's what we've seen.



Between Java Buildpack commit e89e546 and 17162df, we see apps
crashing with Warden out of memory errors. killjava.sh never
fires, and this has led us to believe that the kernel is shooting
a cgroup process in the head after the cgroup oversteps its memory
limit. We cannot find any evidence of the OOM killer firing in any
logs, but we may not be looking in the right place.



The JBP is setting heap to be 70%, metaspace to be 15% (with max
set to the same as initial), 5% for "stack", 5% for "normalised
stack" and 10% for "native". We do not understand why this adds up
to 105%, but haven't looked into the JBP algorithm yet. Any
pointers on what "normalised stack" is would be much appreciated,
as this doesn't appear in the list of heuristics supplied via app env.



Other team members tried applying the same settings that you
suggested - thanks for this. Apps still crash with these settings,
albeit less frequently.



After reading the blog you linked to
(http://java.dzone.com/articles/java-8-permgen-metaspace) we
wondered whether the increased /reserved /metaspace claimed after
metaspace GC might be causing a problem; however we reused the
test code to create a metaspace leak in a CF app and saw metaspace
GCs occur correctly, and memory usage never grow over
MaxMetaspaceSize. This figures, as the committed metaspace is
still less than MaxMetaspaceSize, and the reserved appears to be
whatever RAM is free across the whole DEA.



We noted that an Oracle blog
(https://blogs.oracle.com/poonam/entry/about_g1_garbage_collector_permanent)
mentions that the metaspace size parameters are approximate. We're
currently wondering if native allocations by Tomcat (APR, NIO) are
taking up more container memory, and so when the metaspace fills,
it's creeping slightly over the limit and triggering the kernel's
OOM killer.



Any suggestions would be much appreciated. We've tried to resist
tweaking heuristics blindly, but are running out of options as
we're struggling to figure out how the Java process is using
/committed/ memory. pmap seems to show virtual memory, and so it's
hard to see if things like the metaspace or NIO ByteBuffers are
nabbing too much and trigger the kernel's OOM killer.



Thanks for all your help,



Daniel Jones & David Head-Rapson



On Wed, Apr 29, 2015 at 8:07 PM, Lari Hotari <Lari(a)hotari.net
<mailto:Lari(a)hotari.net>> wrote:

Hi,

I created a few tools to debug OOM problems since the application
I was responsible for running on CF was failing constantly because
of OOM problems. The problems I had, turned out not to be actual
memory leaks in the Java application.

In the "cf events appname" log I would get entries like this:
2015-xx-xxTxx:xx:xx.00-0400 app.crash appname
index: 1, reason: CRASHED, exit_description: out of memory,
exit_status: 255

These type of entries are produced when the container goes over
it's memory resource limits. It doesn't mean that there is a
memory leak in the Java application. The container gets killed by
the Linux kernel oom killer
(https://github.com/cloudfoundry/warden/blob/master/warden/README.md#limit-handle-mem-value)
based on the resource limits set to the warden container.

The memory limit is specified in number of bytes. It is enforced
using the control group associated with the container. When a
container exceeds this limit, one or more of its processes will be
killed by the kernel. Additionally, the Warden will be notified
that an OOM happened and it subsequently tears down the container.

In my case it never got killed by the killjava.sh script that gets
called in the java-buildpack when an OOM happens in Java.

This is the tool I built to debug the problems:
https://github.com/lhotari/java-buildpack-diagnostics-app
I deployed that app as part of the forked buildpack I'm using.
Please read the readme about what it's limitations are. It worked
for me, but it might not work for you. It's opensource and you can
fork it. :)

There is a solution in my toolcase for creating a heapdump and
uploading that to S3:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy
The readme explains how to setup Amazon S3 keys for this:
https://github.com/lhotari/java-buildpack-diagnostics-app#amazon-s3-setup
Once you get a dump, you can then analyse the dump in a java
profiler tool like YourKit.

I also have a solution that forks the java-buildpack modifies
killjava.sh and adds a script that uploads the heapdump to S3 in
the case of OOM:
https://github.com/lhotari/java-buildpack/commit/2d654b80f3bf1a0e0f1bae4f29cb85f56f5f8c46

In java-buildpack-diagnostics-app I have also other tools for
getting Linux operation system specific memory information, for
example:

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemorySmapServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MallocInfoServlet.groovy

These tools are handy for looking at details of the Java process
RSS memory usage growth.

There is also a solution for getting ssh shell access inside your
application with tmate.io <http://tmate.io>:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/TmateSshServlet.groovy
(this version is only compatible with the new "cflinuxfs2" stack)

It looks like there are serious problems on CloudFoundry with the
memory sizing calculation. An application that doesn't have a OOM
problem will get killed by the oom killer because the Java process
will go over the memory limits.
I filed this issue:
https://github.com/cloudfoundry/java-buildpack/issues/157 , but
that might not cover everything.

The workaround for that in my case was to add a native key under
memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that
is for a 2GB total memory).
see example
https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the
memory bounds. I'm sure there is now also a way to get the keys
without forking the buildpack. I could have also adjusted the
percentage portions, but I wanted to set a hard minimum for this case.

It was also required to do some other tuning.

I added this to JAVA_OPTS:
-XX:CompressedClassSpaceSize=256M -XX:InitialCodeCacheSize=64M
-XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M
-XX:ReservedCodeCacheSize=200M -XX:MinMetaspaceExpansion=1M
-XX:MaxMetaspaceExpansion=8M -XX:MaxDirectMemorySize=96M
while trying to keep the Java process from growing in RSS memory size.

The memory overhead of a 64 bit Java process on Linux can be
reduced by specifying these environment variables:

stack: cflinuxfs2
.
.
.
env:
MALLOC_ARENA_MAX: 2
MALLOC_MMAP_THRESHOLD_: 131072
MALLOC_TRIM_THRESHOLD_: 131072
MALLOC_TOP_PAD_: 131072
MALLOC_MMAP_MAX_: 65536

MALLOC_ARENA_MAX works only on cflinuxfs2 stack (the lucid64 stack
has a buggy version of glibc).

explanation about MALLOC_ARENA_MAX from Heroku:
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
some measurement data how it reduces memory consumption:
https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

I have created a PR to add this to CF java-buildpack:
https://github.com/cloudfoundry/java-buildpack/pull/160

I also created an issues
https://github.com/cloudfoundry/java-buildpack/issues/163 and
https://github.com/cloudfoundry/java-buildpack/pull/159 .

I hope this information helps others struggling with OOM problems
in CF.
I'm not saying that this is a ready made solution just for you.
YMMV. It worked for me.

-Lari




On 15-04-29 10:53 AM, Head-Rapson, David wrote:

Hi,

I’m after some guidance on how to get profile Java apps in CF,
in order to get to the bottom of memory issues.

We have an app that’s crashing every few hours with OOM error,
most likely it’s a memory leak.

I’d like to profile the JVM and work out what’s eating memory,
however tools like yourkit require connectivity INTO the JVM
server (i.e. the warden container), either via host / port or
via SSH.

Since warden containers cannot be connected to on ports other
than for HTTP and cannot be SSHd to, neither of these works
for me.



I tried installed a standalone JDK onto the warden container,
however as soon as I ran ‘jmap’ to invoke the dump, warden
cleaned up the container – most likely for memory
over-consumption.



I had previously found a hack in the Weblogic buildpack
(https://github.com/pivotal-cf/weblogic-buildpack/blob/master/docs/container-wls-monitoring.md)
for modifying the start script which, when used with
–XX:HeapDumpOnOutOfMemoryError, should copy any heapdump files
to a file share somewhere. I have my own custom buildpack so
I could use something similar.

Has anyone got a better solution than this?



We would love to use newrelic / app dynamics for this however
we’re not allowed. And I’m not 100% certain they could help
with this either.



Dave



The information transmitted is intended for the person or
entity to which it is addressed and may contain confidential,
privileged or copyrighted material. If you receive this in
error, please contact the sender and delete the material from
any computer. Fidelity only gives information on products and
services and does not give investment advice to retail clients
based on individual circumstances. Any comments or statements
made are not necessarily those of Fidelity. All e-mails may be
monitored. FIL Investments International (Reg. No.1448245),
FIL Investment Services (UK) Limited (Reg. No. 2016555), FIL
Pensions Management (Reg. No. 2015142) and Financial
Administration Services Limited (Reg. No. 1629709) are
authorised and regulated in the UK by the Financial Conduct
Authority. FIL Life Insurance Limited (Reg No. 3406905) is
authorised in the UK by the Prudential Regulation Authority
and regulated in the UK by the Financial Conduct Authority and
the Prudential Regulation Authority. Registered offices at
Oakhill House, 130 Tonbridge Road, Hildenborough, Tonbridge,
Kent TN11 9DZ.

--
You received this message because you are subscribed to the
Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit
https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com
<https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com?utm_medium=email&utm_source=footer>.
To unsubscribe from this group and stop receiving emails from
it, send an email to vcap-dev+unsubscribe(a)cloudfoundry.org
<mailto:vcap-dev+unsubscribe(a)cloudfoundry.org>.




_______________________________________________
Cf-dev mailing list
Cf-dev(a)lists.cloudfoundry.org <mailto:Cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev





--

Regards,



Daniel Jones

EngineerBetter.com



Re: Recipe to install Diego?

Ken Ojiri
 

Hi,

I have posted a sample BOSH deployment manifest to Gist.
https://gist.github.com/ozzozz/4c08c37863b703a75afc
I could deploy cf-release v207 and diego-release 0.1099.0 to AWS Tokyo
region by MicroBOSH.

I could also deploy cf-release and diego-release to OpenStack(Juno).
The manifests differs only in 'networks', 'cloud_properties' and 'stemcell'.

Regards,
Ken

---
<ozzozz(a)gmail.com>
Mitaka, Tokyo Japan

On Sat, May 9, 2015 at 8:57 PM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:
Hi,

Are there any examples or docs on installing Diego with bosh/microbosh?
Using the bosh-lite as a template, I'm tripping up on various parts. Is this
even a valid direction in installing?
Either AWS or Openstack..

Thanks,
Tom

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: [vcap-dev] Java OOM debugging

Dave Head-Rapson
 

Thanks for the continued advice.

We’ve hit on a key discovery after yet another a soak test this weekend.

- When we deploy using Tomcat 8.0.18 we don’t see the issue

- When we deploy using Tomcat 8.0.20 (same app version, same CF space, same services bound, same JBP code version, same JRE version, running at the same time), we see the crashes occurring after just a couple of hours.

Ideally we’d go ahead with the memory calculations you mentioned however we’re stuck on lucid64 because we’re using Pivotal CF 1.3.x & we’re having upgrade issues to 1.4.x.
So we’re not able to adjust MALLOC_ARENA_MAX, nor are we able to view RSS in pmap as you describe

Other things we’ve tried:

- We set verbose garbage collection to verify there was no memory size issues within the JVM. There wasn’t.

- We tried setting minimum memory for native, it had no effect. The container still gets killed

- We tried adjusting the ‘memory heuristics’ so that they added up to 80 rather than 100. This had the effect of causing a delay in the container being killed. However it still was killed.

This seems like classic memory leak behaviour to me.

From: Lari Hotari [mailto:lari.hotari(a)sagire.fi] On Behalf Of Lari Hotari
Sent: 08 May 2015 16:25
To: Daniel Jones; Head-Rapson, David
Cc: cf-dev(a)lists.cloudfoundry.org
Subject: Re: [Cf-dev] [vcap-dev] Java OOM debugging


For my case, it turned out to be essential to reserve enough memory for "native" in the JBP. For the 2GB total memory, I set the minimum to 330M. With that setting I have been able to get over 2 weeks up time by now.

I mentioned this in my previous email:

The workaround for that in my case was to add a native key under memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a 2GB total memory).
see example https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory bounds. I'm sure there is now also a way to get the keys without forking the buildpack. I could have also adjusted the percentage portions, but I wanted to set a hard minimum for this case.

I've been trying to get some insight by diffing the reports gathered from the meminfo servlet https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy

Here is such an example of a diff:
https://gist.github.com/lhotari/ee77decc2585f56cf3ad#file-meminfo_diff_example-txt

meminfo has pmap output included to get the report of the memory map of the process. I have just noticed that most of the memory has already been mmap:ed from the OS and it's just growing in RSS size. For example:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
The pmap output from lucid64 didn't include the RSS size, so you have to use cflinuxfs2 for this. It's also better because of other reasons. The glibc in lucid64 is old and has some bugs around the MALLOC_ARENA_MAX.

I was manually able to estimate the maximum size of the RSS size of what the Java process will consume by simply picking the large anon-blocks from the pmap report and calculating those blocks by the allocated virtual size (VSS).
Based on this calculation, I picked the minimum of 330M for "native" in open_jdk_jre.yml as I mentioned before.

It looks like these rows are for the Heap size:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
It looks like the JVM doesn't fully allocate that block in RSS initially and most of the growth of RSS size comes from that in my case. In your case, it might be something different.

I also added a servlet for getting glibc malloc_info statistics in XML format (). I haven't really analysed that information because of time constraints and because I don't have a pressing problem any more. btw. The malloc_info XML report is missing some key elements, that has been added in later glibc versions (https://github.com/bminor/glibc/commit/4d653a59ffeae0f46f76a40230e2cfa9587b7e7e).

If killjava.sh never fires and the app crashed with Warden out of memory errors, then I believe it's the kernel's cgroups OOM killer that has killed the container processes. I have found this location where Warden oom notifier gets the OOM notification event:
https://github.com/cloudfoundry/warden/blob/ad18bff/warden/lib/warden/container/features/mem_limit.rb#L70
This is the oom.c source code: https://github.com/cloudfoundry/warden/blob/ad18bff7dc56acbc55ff10bcc6045ebdf0b20c97/warden/src/oom/oom.c . It reads the cgroups control files and receives events from the kernel that way.

I'd suggest that you use pmap for the Java process after it has started and calculate the maximum RSS size by calculating the VSS size of the large anon blocks instead of RSS for the blocks that the Java process has reserved for it's different memory areas (I think you shouldn't . You should discard adding VSS for the CompressedClassSpaceSize block.
After this calculation, add enough memory to the "native" parameter in JBP until the RSS size calculated this way stays under the limit.
That's the only "method" I have come up by now.

It might be required to have some RSS space allocated for any zip/jar files read by the Java process. I think that Java uses mmap files for zip file reading by default and that might go on top of all other limits.
To test this theory, I'd suggest testing by adding -Dsun.zip.disableMemoryMapping=true system property setting to JAVA_OPTS. That disables the native mmap for zip/jar file reading. I haven't had time to test this assumption.

I guess the only way to understand how Java allocates memory is to look at the source code.
from http://openjdk.java.net/projects/jdk8u/ , the instructions to get the source code of JDK 8:
hg clone http://hg.openjdk.java.net/jdk8u/jdk8u;cd jdk8u;sh get_source.sh
This tool is really good for grepping and searching the source code: http://geoff.greer.fm/ag/
On Ubuntu it's in silversearcher-ag package, "apt-get install silversearcher-ag" and on MacOSX brew it's "brew install the_silver_searcher".
This alias is pretty useful:
alias codegrep='ag --color --group --pager less -C 5'
Then you just search for the correct location in code by starting with the tokens you know about:
codegrep MaxMetaspaceSize
this gives pretty good starting points in looking how the JDK allocates memory.

So the JDK source code is only a few commands away.

It would be interesting to hear more about this if someone has the time to dig in to this. This is about how far I got and I hope sharing this information helps someone continue. :)


Lari
github/twitter: lhotari
On 15-05-08 10:02 AM, Daniel Jones wrote:
Hi Lari et al,

Thanks for your help Lari.

David and I are pairing on this issue, and we're yet to resolve it. We're in the process of creating a repeatable test case (our most crashy app makes calls to external services that need mocking), but in the meantime, here's what we've seen.

Between Java Buildpack commit e89e546 and 17162df, we see apps crashing with Warden out of memory errors. killjava.sh never fires, and this has led us to believe that the kernel is shooting a cgroup process in the head after the cgroup oversteps its memory limit. We cannot find any evidence of the OOM killer firing in any logs, but we may not be looking in the right place.

The JBP is setting heap to be 70%, metaspace to be 15% (with max set to the same as initial), 5% for "stack", 5% for "normalised stack" and 10% for "native". We do not understand why this adds up to 105%, but haven't looked into the JBP algorithm yet. Any pointers on what "normalised stack" is would be much appreciated, as this doesn't appear in the list of heuristics supplied via app env.

Other team members tried applying the same settings that you suggested - thanks for this. Apps still crash with these settings, albeit less frequently.

After reading the blog you linked to (http://java.dzone.com/articles/java-8-permgen-metaspace) we wondered whether the increased reserved metaspace claimed after metaspace GC might be causing a problem; however we reused the test code to create a metaspace leak in a CF app and saw metaspace GCs occur correctly, and memory usage never grow over MaxMetaspaceSize. This figures, as the committed metaspace is still less than MaxMetaspaceSize, and the reserved appears to be whatever RAM is free across the whole DEA.

We noted that an Oracle blog (https://blogs.oracle.com/poonam/entry/about_g1_garbage_collector_permanent) mentions that the metaspace size parameters are approximate. We're currently wondering if native allocations by Tomcat (APR, NIO) are taking up more container memory, and so when the metaspace fills, it's creeping slightly over the limit and triggering the kernel's OOM killer.

Any suggestions would be much appreciated. We've tried to resist tweaking heuristics blindly, but are running out of options as we're struggling to figure out how the Java process is using committed memory. pmap seems to show virtual memory, and so it's hard to see if things like the metaspace or NIO ByteBuffers are nabbing too much and trigger the kernel's OOM killer.

Thanks for all your help,

Daniel Jones & David Head-Rapson

On Wed, Apr 29, 2015 at 8:07 PM, Lari Hotari <Lari(a)hotari.net<mailto:Lari(a)hotari.net>> wrote:
Hi,

I created a few tools to debug OOM problems since the application I was responsible for running on CF was failing constantly because of OOM problems. The problems I had, turned out not to be actual memory leaks in the Java application.

In the "cf events appname" log I would get entries like this:
2015-xx-xxTxx:xx:xx.00-0400 app.crash appname index: 1, reason: CRASHED, exit_description: out of memory, exit_status: 255

These type of entries are produced when the container goes over it's memory resource limits. It doesn't mean that there is a memory leak in the Java application. The container gets killed by the Linux kernel oom killer (https://github.com/cloudfoundry/warden/blob/master/warden/README.md#limit-handle-mem-value) based on the resource limits set to the warden container.

The memory limit is specified in number of bytes. It is enforced using the control group associated with the container. When a container exceeds this limit, one or more of its processes will be killed by the kernel. Additionally, the Warden will be notified that an OOM happened and it subsequently tears down the container.
In my case it never got killed by the killjava.sh script that gets called in the java-buildpack when an OOM happens in Java.

This is the tool I built to debug the problems:
https://github.com/lhotari/java-buildpack-diagnostics-app
I deployed that app as part of the forked buildpack I'm using.
Please read the readme about what it's limitations are. It worked for me, but it might not work for you. It's opensource and you can fork it. :)

There is a solution in my toolcase for creating a heapdump and uploading that to S3:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy
The readme explains how to setup Amazon S3 keys for this: https://github.com/lhotari/java-buildpack-diagnostics-app#amazon-s3-setup
Once you get a dump, you can then analyse the dump in a java profiler tool like YourKit.

I also have a solution that forks the java-buildpack modifies killjava.sh and adds a script that uploads the heapdump to S3 in the case of OOM:
https://github.com/lhotari/java-buildpack/commit/2d654b80f3bf1a0e0f1bae4f29cb85f56f5f8c46

In java-buildpack-diagnostics-app I have also other tools for getting Linux operation system specific memory information, for example:

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemorySmapServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MallocInfoServlet.groovy

These tools are handy for looking at details of the Java process RSS memory usage growth.

There is also a solution for getting ssh shell access inside your application with tmate.io<http://tmate.io>:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/TmateSshServlet.groovy (this version is only compatible with the new "cflinuxfs2" stack)

It looks like there are serious problems on CloudFoundry with the memory sizing calculation. An application that doesn't have a OOM problem will get killed by the oom killer because the Java process will go over the memory limits.
I filed this issue: https://github.com/cloudfoundry/java-buildpack/issues/157 , but that might not cover everything.

The workaround for that in my case was to add a native key under memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a 2GB total memory).
see example https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory bounds. I'm sure there is now also a way to get the keys without forking the buildpack. I could have also adjusted the percentage portions, but I wanted to set a hard minimum for this case.

It was also required to do some other tuning.

I added this to JAVA_OPTS:
-XX:CompressedClassSpaceSize=256M -XX:InitialCodeCacheSize=64M -XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M -XX:ReservedCodeCacheSize=200M -XX:MinMetaspaceExpansion=1M -XX:MaxMetaspaceExpansion=8M -XX:MaxDirectMemorySize=96M
while trying to keep the Java process from growing in RSS memory size.

The memory overhead of a 64 bit Java process on Linux can be reduced by specifying these environment variables:

stack: cflinuxfs2
.
.
.
env:
MALLOC_ARENA_MAX: 2
MALLOC_MMAP_THRESHOLD_: 131072
MALLOC_TRIM_THRESHOLD_: 131072
MALLOC_TOP_PAD_: 131072
MALLOC_MMAP_MAX_: 65536

MALLOC_ARENA_MAX works only on cflinuxfs2 stack (the lucid64 stack has a buggy version of glibc).

explanation about MALLOC_ARENA_MAX from Heroku:
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
some measurement data how it reduces memory consumption: https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

I have created a PR to add this to CF java-buildpack:
https://github.com/cloudfoundry/java-buildpack/pull/160

I also created an issues https://github.com/cloudfoundry/java-buildpack/issues/163 and https://github.com/cloudfoundry/java-buildpack/pull/159 .

I hope this information helps others struggling with OOM problems in CF.
I'm not saying that this is a ready made solution just for you. YMMV. It worked for me.

-Lari



On 15-04-29 10:53 AM, Head-Rapson, David wrote:
Hi,
I’m after some guidance on how to get profile Java apps in CF, in order to get to the bottom of memory issues.
We have an app that’s crashing every few hours with OOM error, most likely it’s a memory leak.
I’d like to profile the JVM and work out what’s eating memory, however tools like yourkit require connectivity INTO the JVM server (i.e. the warden container), either via host / port or via SSH.
Since warden containers cannot be connected to on ports other than for HTTP and cannot be SSHd to, neither of these works for me.

I tried installed a standalone JDK onto the warden container, however as soon as I ran ‘jmap’ to invoke the dump, warden cleaned up the container – most likely for memory over-consumption.

I had previously found a hack in the Weblogic buildpack (https://github.com/pivotal-cf/weblogic-buildpack/blob/master/docs/container-wls-monitoring.md) for modifying the start script which, when used with –XX:HeapDumpOnOutOfMemoryError, should copy any heapdump files to a file share somewhere. I have my own custom buildpack so I could use something similar.
Has anyone got a better solution than this?

We would love to use newrelic / app dynamics for this however we’re not allowed. And I’m not 100% certain they could help with this either.

Dave

The information transmitted is intended for the person or entity to which it is addressed and may contain confidential, privileged or copyrighted material. If you receive this in error, please contact the sender and delete the material from any computer. Fidelity only gives information on products and services and does not give investment advice to retail clients based on individual circumstances. Any comments or statements made are not necessarily those of Fidelity. All e-mails may be monitored. FIL Investments International (Reg. No.1448245), FIL Investment Services (UK) Limited (Reg. No. 2016555), FIL Pensions Management (Reg. No. 2015142) and Financial Administration Services Limited (Reg. No. 1629709) are authorised and regulated in the UK by the Financial Conduct Authority. FIL Life Insurance Limited (Reg No. 3406905) is authorised in the UK by the Prudential Regulation Authority and regulated in the UK by the Financial Conduct Authority and the Prudential Regulation Authority. Registered offices at Oakhill House, 130 Tonbridge Road, Hildenborough, Tonbridge, Kent TN11 9DZ.
--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com<https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com?utm_medium=email&utm_source=footer>.
To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+unsubscribe(a)cloudfoundry.org<mailto:vcap-dev+unsubscribe(a)cloudfoundry.org>.


_______________________________________________
Cf-dev mailing list
Cf-dev(a)lists.cloudfoundry.org<mailto:Cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



--
Regards,

Daniel Jones
EngineerBetter.com


Recipe to install Diego?

Lev Berman <lev.berman@...>
 

Hi,

I can share my experience on installing Diego on AWS. I followed the
instructions
for BOSH Lite deployment
<https://github.com/cloudfoundry-incubator/diego-release#deploying-diego-to-a-local-bosh-lite-instance>except
for the fact I replaced 3 templates with the ones you can find in the
attachment. Note that in my case instance-count-overrides.yml leads to a
one-AZ deployment. Prerequisites include creating a separate AWS subnet for
Diego. Also, you need to configure routes and security groups in the same
manner you did it for Cloud Foundry.

On Sat, May 9, 2015 at 2:57 PM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:

Hi,

Are there any examples or docs on installing Diego with bosh/microbosh?
Using the bosh-lite as a template, I'm tripping up on various parts. Is
this even a valid direction in installing?
Either AWS or Openstack..

Thanks,
Tom
--
Lev Berman

Altoros - Cloud Foundry deployment, training and integration

Github
*: https://github.com/ldmberman <https://github.com/ldmberman>*


Re: Is there an auto-completion script?

Daniel Kaplan
 

Great, thanks a lot for the links.

-Dan

On Thu, May 7, 2015 at 10:50 PM, Takeshi Morikawa <moog0814(a)gmail.com>
wrote:

Hi Daniel

I found this

cf(cli) completion
https://github.com/cf-buildpacks/cf_completion

bosh cli completion
https://github.com/anfernee/bosh-completion

Is my answer what you're hoping for?

2015-05-08 14:28 GMT+09:00 Daniel Kaplan <dkaplan(a)pivotal.io>:

Hi DevList,

I think it would be extra convenient if there was Cloud Foundry
auto-completion script that worked similar to the way git's
git-completion
<https://github.com/git/git/blob/master/contrib/completion/git-completion.bash>
works.

Does one already exist? If not, I might write it in my free time. Let
me know your thoughts.

Thanks,
Dan

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Recipe to install Diego?

Tom Sherrod <tom.sherrod@...>
 

Hi,

Are there any examples or docs on installing Diego with bosh/microbosh?
Using the bosh-lite as a template, I'm tripping up on various parts. Is
this even a valid direction in installing?
Either AWS or Openstack..

Thanks,
Tom


Re: [cf-lattice] [cf-bosh] Links to Nabble archives of the CF lists

James Bayer
 

aaron,

i added a page on the cf community wiki with your links:
https://github.com/cloudfoundry-community/cf-docs-contrib/wiki/Mailing-Lists

On Wed, May 6, 2015 at 1:55 PM, Christopher B Ferris <chrisfer(a)us.ibm.com>
wrote:

+! nice job!

Cheers,

Christopher Ferris
IBM Distinguished Engineer, CTO Open Cloud
IBM Software Group, Open Technologies
email: chrisfer(a)us.ibm.com
twitter: @christo4ferris
blog: http://thoughtsoncloud.com/index.php/author/cferris/
phone: +1 508 667 0402

[image: Inactive hide details for Chip Childers ---05/06/2015 01:51:38
PM---Thanks Aaron! Chip Childers | Technology Chief of Staff | C]Chip
Childers ---05/06/2015 01:51:38 PM---Thanks Aaron! Chip Childers |
Technology Chief of Staff | Cloud Foundry Foundation

From: Chip Childers <cchilders(a)cloudfoundry.org>
To: "Huber, Aaron M" <aaron.m.huber(a)intel.com>
Cc: "cf-dev(a)lists.cloudfoundry.org" <cf-dev(a)lists.cloudfoundry.org>, "
cf-lattice(a)lists.cloudfoundry.org" <cf-lattice(a)lists.cloudfoundry.org>, "
cf-bosh(a)lists.cloudfoundry.org" <cf-bosh(a)lists.cloudfoundry.org>
Date: 05/06/2015 01:51 PM
Subject: Re: [cf-dev] [cf-bosh] Links to Nabble archives of the CF lists
Sent by: cf-dev-bounces(a)lists.cloudfoundry.org
------------------------------



Thanks Aaron!

Chip Childers | Technology Chief of Staff | Cloud Foundry Foundation

On Wed, May 6, 2015 at 4:28 PM, Huber, Aaron M <*aaron.m.huber(a)intel.com*
<aaron.m.huber(a)intel.com>> wrote:

I’ve created Nabble archives of the CF lists here:



*http://cf-bosh.70367.x6.nabble.com/*
<http://cf-bosh.70367.x6.nabble.com/>

*http://cf-dev.70369.x6.nabble.com/*
<http://cf-dev.70369.x6.nabble.com/>

*http://cf-lattice.70370.x6.nabble.com/*
<http://cf-lattice.70370.x6.nabble.com/>



The archives are searchable and allow web viewing of the mailing list
without subscribing via email. There is also an RSS feed for each list.



Aaron

_______________________________________________
cf-bosh mailing list
*cf-bosh(a)lists.cloudfoundry.org* <cf-bosh(a)lists.cloudfoundry.org>
*https://lists.cloudfoundry.org/mailman/listinfo/cf-bosh*
<https://lists.cloudfoundry.org/mailman/listinfo/cf-bosh>


_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



_______________________________________________
cf-lattice mailing list
cf-lattice(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-lattice


--
Thank you,

James Bayer


Re: Can't Create Service Instance in Cloud Foundry

James Bayer
 

looks like you already took what was going to be my advice and inquired on
the maintainers of the services-contrib repo with an issue:
https://github.com/cloudfoundry-community/cf-services-contrib-release/issues/154

good luck

On Wed, May 6, 2015 at 8:20 AM, Matthew Landry <mhlandry(a)gmail.com> wrote:


I'm trying to get the introduction to spring cloud
<https://spring.io/blog/2014/06/03/introducing-spring-cloud> app working
with an instance of Cloud Foundry that I'm running on my machine. I tried
to push the app and I get this message:

Could not find service postgres-service to bind to
hello-spring-cloud

That makes sense to me so I started tracking down the postgres service.
When I run `cf marketplace`, I get:

service plans description
mongodb default MongoDB NoSQL database
postgresql default PostgreSQL database
rabbitmq default RabbitMQ message queue
redis default Redis key-value store

When I try to create a service instance of the postgresql service and I
get:

$ cf create-service postgresql default postgresql-service
Creating service instance postgresql-service in org xyz / space
development as admin...
FAILED
Server error, status code: 500, error code: 10001, message: Service
broker error: Not authorized

Here are the permissions for the space:

Getting users in org xyz / space development as admin

SPACE MANAGER
admin

SPACE DEVELOPER
Admin

The cf service-access command doesn¹t yield anything interesting:

$ cf service-access
Getting service access as admin

Then it shows nothing on the terminal. So I tried to enable service access

$ cf enable-service-access postgresql
Enabling access to all plans of service postgresql for all orgs as
admin...
All plans of the service are already accessible for all orgs
OK

For the life of me, I can¹t figure out what¹s going on. I posted a
question to Stack Overflow <
http://stackoverflow.com/questions/30034143/cant-create-service-instance-in-cloud-foundry>
and was referred here after some debugging
attempts were fruitless. Anybody got any ideas?

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

--
Thank you,

James Bayer


Re: stdout.log and stderr.log not show in CF197 with loggregator enabled

James Bayer
 

i believe those files were removed since loggregator gives you access to
the files (and you get get the content via syslog). you may be able to
adjust the start command to write them out again.

On Tue, May 5, 2015 at 4:31 PM, Zhang, Yuan <Yuan.Zhang(a)emc.com> wrote:

Hi,



We upgrade from CF172 to CF197 and enable loggregator on CF197. But for
application deployed to CF197 (with loggregator enabled), we DO NOT

see stdout.log and stderr.log anymore in application logs directory
anymore. We can see logs/stdout.log and logs/stderr.log in CF172.



CF197:

cf file <app> logs

Getting file contents... OK



staging_task.log 1.3K



Can you tell us what setting in CF 197 can affect stdout.log and
stderr.log show up or not? How to let logs/stdout.log and logs/stderr.log
show up?



Thanks,

Tina Zhang



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

--
Thank you,

James Bayer

9301 - 9320 of 9387