Re: [vcap-dev] Java OOM debugging


Dave Head-Rapson
 

Thanks for the continued advice.

We’ve hit on a key discovery after yet another a soak test this weekend.

- When we deploy using Tomcat 8.0.18 we don’t see the issue

- When we deploy using Tomcat 8.0.20 (same app version, same CF space, same services bound, same JBP code version, same JRE version, running at the same time), we see the crashes occurring after just a couple of hours.

Ideally we’d go ahead with the memory calculations you mentioned however we’re stuck on lucid64 because we’re using Pivotal CF 1.3.x & we’re having upgrade issues to 1.4.x.
So we’re not able to adjust MALLOC_ARENA_MAX, nor are we able to view RSS in pmap as you describe

Other things we’ve tried:

- We set verbose garbage collection to verify there was no memory size issues within the JVM. There wasn’t.

- We tried setting minimum memory for native, it had no effect. The container still gets killed

- We tried adjusting the ‘memory heuristics’ so that they added up to 80 rather than 100. This had the effect of causing a delay in the container being killed. However it still was killed.

This seems like classic memory leak behaviour to me.

From: Lari Hotari [mailto:lari.hotari(a)sagire.fi] On Behalf Of Lari Hotari
Sent: 08 May 2015 16:25
To: Daniel Jones; Head-Rapson, David
Cc: cf-dev(a)lists.cloudfoundry.org
Subject: Re: [Cf-dev] [vcap-dev] Java OOM debugging


For my case, it turned out to be essential to reserve enough memory for "native" in the JBP. For the 2GB total memory, I set the minimum to 330M. With that setting I have been able to get over 2 weeks up time by now.

I mentioned this in my previous email:

The workaround for that in my case was to add a native key under memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a 2GB total memory).
see example https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory bounds. I'm sure there is now also a way to get the keys without forking the buildpack. I could have also adjusted the percentage portions, but I wanted to set a hard minimum for this case.

I've been trying to get some insight by diffing the reports gathered from the meminfo servlet https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy

Here is such an example of a diff:
https://gist.github.com/lhotari/ee77decc2585f56cf3ad#file-meminfo_diff_example-txt

meminfo has pmap output included to get the report of the memory map of the process. I have just noticed that most of the memory has already been mmap:ed from the OS and it's just growing in RSS size. For example:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
The pmap output from lucid64 didn't include the RSS size, so you have to use cflinuxfs2 for this. It's also better because of other reasons. The glibc in lucid64 is old and has some bugs around the MALLOC_ARENA_MAX.

I was manually able to estimate the maximum size of the RSS size of what the Java process will consume by simply picking the large anon-blocks from the pmap report and calculating those blocks by the allocated virtual size (VSS).
Based on this calculation, I picked the minimum of 330M for "native" in open_jdk_jre.yml as I mentioned before.

It looks like these rows are for the Heap size:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
It looks like the JVM doesn't fully allocate that block in RSS initially and most of the growth of RSS size comes from that in my case. In your case, it might be something different.

I also added a servlet for getting glibc malloc_info statistics in XML format (). I haven't really analysed that information because of time constraints and because I don't have a pressing problem any more. btw. The malloc_info XML report is missing some key elements, that has been added in later glibc versions (https://github.com/bminor/glibc/commit/4d653a59ffeae0f46f76a40230e2cfa9587b7e7e).

If killjava.sh never fires and the app crashed with Warden out of memory errors, then I believe it's the kernel's cgroups OOM killer that has killed the container processes. I have found this location where Warden oom notifier gets the OOM notification event:
https://github.com/cloudfoundry/warden/blob/ad18bff/warden/lib/warden/container/features/mem_limit.rb#L70
This is the oom.c source code: https://github.com/cloudfoundry/warden/blob/ad18bff7dc56acbc55ff10bcc6045ebdf0b20c97/warden/src/oom/oom.c . It reads the cgroups control files and receives events from the kernel that way.

I'd suggest that you use pmap for the Java process after it has started and calculate the maximum RSS size by calculating the VSS size of the large anon blocks instead of RSS for the blocks that the Java process has reserved for it's different memory areas (I think you shouldn't . You should discard adding VSS for the CompressedClassSpaceSize block.
After this calculation, add enough memory to the "native" parameter in JBP until the RSS size calculated this way stays under the limit.
That's the only "method" I have come up by now.

It might be required to have some RSS space allocated for any zip/jar files read by the Java process. I think that Java uses mmap files for zip file reading by default and that might go on top of all other limits.
To test this theory, I'd suggest testing by adding -Dsun.zip.disableMemoryMapping=true system property setting to JAVA_OPTS. That disables the native mmap for zip/jar file reading. I haven't had time to test this assumption.

I guess the only way to understand how Java allocates memory is to look at the source code.
from http://openjdk.java.net/projects/jdk8u/ , the instructions to get the source code of JDK 8:
hg clone http://hg.openjdk.java.net/jdk8u/jdk8u;cd jdk8u;sh get_source.sh
This tool is really good for grepping and searching the source code: http://geoff.greer.fm/ag/
On Ubuntu it's in silversearcher-ag package, "apt-get install silversearcher-ag" and on MacOSX brew it's "brew install the_silver_searcher".
This alias is pretty useful:
alias codegrep='ag --color --group --pager less -C 5'
Then you just search for the correct location in code by starting with the tokens you know about:
codegrep MaxMetaspaceSize
this gives pretty good starting points in looking how the JDK allocates memory.

So the JDK source code is only a few commands away.

It would be interesting to hear more about this if someone has the time to dig in to this. This is about how far I got and I hope sharing this information helps someone continue. :)


Lari
github/twitter: lhotari
On 15-05-08 10:02 AM, Daniel Jones wrote:
Hi Lari et al,

Thanks for your help Lari.

David and I are pairing on this issue, and we're yet to resolve it. We're in the process of creating a repeatable test case (our most crashy app makes calls to external services that need mocking), but in the meantime, here's what we've seen.

Between Java Buildpack commit e89e546 and 17162df, we see apps crashing with Warden out of memory errors. killjava.sh never fires, and this has led us to believe that the kernel is shooting a cgroup process in the head after the cgroup oversteps its memory limit. We cannot find any evidence of the OOM killer firing in any logs, but we may not be looking in the right place.

The JBP is setting heap to be 70%, metaspace to be 15% (with max set to the same as initial), 5% for "stack", 5% for "normalised stack" and 10% for "native". We do not understand why this adds up to 105%, but haven't looked into the JBP algorithm yet. Any pointers on what "normalised stack" is would be much appreciated, as this doesn't appear in the list of heuristics supplied via app env.

Other team members tried applying the same settings that you suggested - thanks for this. Apps still crash with these settings, albeit less frequently.

After reading the blog you linked to (http://java.dzone.com/articles/java-8-permgen-metaspace) we wondered whether the increased reserved metaspace claimed after metaspace GC might be causing a problem; however we reused the test code to create a metaspace leak in a CF app and saw metaspace GCs occur correctly, and memory usage never grow over MaxMetaspaceSize. This figures, as the committed metaspace is still less than MaxMetaspaceSize, and the reserved appears to be whatever RAM is free across the whole DEA.

We noted that an Oracle blog (https://blogs.oracle.com/poonam/entry/about_g1_garbage_collector_permanent) mentions that the metaspace size parameters are approximate. We're currently wondering if native allocations by Tomcat (APR, NIO) are taking up more container memory, and so when the metaspace fills, it's creeping slightly over the limit and triggering the kernel's OOM killer.

Any suggestions would be much appreciated. We've tried to resist tweaking heuristics blindly, but are running out of options as we're struggling to figure out how the Java process is using committed memory. pmap seems to show virtual memory, and so it's hard to see if things like the metaspace or NIO ByteBuffers are nabbing too much and trigger the kernel's OOM killer.

Thanks for all your help,

Daniel Jones & David Head-Rapson

On Wed, Apr 29, 2015 at 8:07 PM, Lari Hotari <Lari(a)hotari.net<mailto:Lari(a)hotari.net>> wrote:
Hi,

I created a few tools to debug OOM problems since the application I was responsible for running on CF was failing constantly because of OOM problems. The problems I had, turned out not to be actual memory leaks in the Java application.

In the "cf events appname" log I would get entries like this:
2015-xx-xxTxx:xx:xx.00-0400 app.crash appname index: 1, reason: CRASHED, exit_description: out of memory, exit_status: 255

These type of entries are produced when the container goes over it's memory resource limits. It doesn't mean that there is a memory leak in the Java application. The container gets killed by the Linux kernel oom killer (https://github.com/cloudfoundry/warden/blob/master/warden/README.md#limit-handle-mem-value) based on the resource limits set to the warden container.

The memory limit is specified in number of bytes. It is enforced using the control group associated with the container. When a container exceeds this limit, one or more of its processes will be killed by the kernel. Additionally, the Warden will be notified that an OOM happened and it subsequently tears down the container.
In my case it never got killed by the killjava.sh script that gets called in the java-buildpack when an OOM happens in Java.

This is the tool I built to debug the problems:
https://github.com/lhotari/java-buildpack-diagnostics-app
I deployed that app as part of the forked buildpack I'm using.
Please read the readme about what it's limitations are. It worked for me, but it might not work for you. It's opensource and you can fork it. :)

There is a solution in my toolcase for creating a heapdump and uploading that to S3:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy
The readme explains how to setup Amazon S3 keys for this: https://github.com/lhotari/java-buildpack-diagnostics-app#amazon-s3-setup
Once you get a dump, you can then analyse the dump in a java profiler tool like YourKit.

I also have a solution that forks the java-buildpack modifies killjava.sh and adds a script that uploads the heapdump to S3 in the case of OOM:
https://github.com/lhotari/java-buildpack/commit/2d654b80f3bf1a0e0f1bae4f29cb85f56f5f8c46

In java-buildpack-diagnostics-app I have also other tools for getting Linux operation system specific memory information, for example:

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemorySmapServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MallocInfoServlet.groovy

These tools are handy for looking at details of the Java process RSS memory usage growth.

There is also a solution for getting ssh shell access inside your application with tmate.io<http://tmate.io>:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/TmateSshServlet.groovy (this version is only compatible with the new "cflinuxfs2" stack)

It looks like there are serious problems on CloudFoundry with the memory sizing calculation. An application that doesn't have a OOM problem will get killed by the oom killer because the Java process will go over the memory limits.
I filed this issue: https://github.com/cloudfoundry/java-buildpack/issues/157 , but that might not cover everything.

The workaround for that in my case was to add a native key under memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a 2GB total memory).
see example https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory bounds. I'm sure there is now also a way to get the keys without forking the buildpack. I could have also adjusted the percentage portions, but I wanted to set a hard minimum for this case.

It was also required to do some other tuning.

I added this to JAVA_OPTS:
-XX:CompressedClassSpaceSize=256M -XX:InitialCodeCacheSize=64M -XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M -XX:ReservedCodeCacheSize=200M -XX:MinMetaspaceExpansion=1M -XX:MaxMetaspaceExpansion=8M -XX:MaxDirectMemorySize=96M
while trying to keep the Java process from growing in RSS memory size.

The memory overhead of a 64 bit Java process on Linux can be reduced by specifying these environment variables:

stack: cflinuxfs2
.
.
.
env:
MALLOC_ARENA_MAX: 2
MALLOC_MMAP_THRESHOLD_: 131072
MALLOC_TRIM_THRESHOLD_: 131072
MALLOC_TOP_PAD_: 131072
MALLOC_MMAP_MAX_: 65536

MALLOC_ARENA_MAX works only on cflinuxfs2 stack (the lucid64 stack has a buggy version of glibc).

explanation about MALLOC_ARENA_MAX from Heroku:
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
some measurement data how it reduces memory consumption: https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

I have created a PR to add this to CF java-buildpack:
https://github.com/cloudfoundry/java-buildpack/pull/160

I also created an issues https://github.com/cloudfoundry/java-buildpack/issues/163 and https://github.com/cloudfoundry/java-buildpack/pull/159 .

I hope this information helps others struggling with OOM problems in CF.
I'm not saying that this is a ready made solution just for you. YMMV. It worked for me.

-Lari



On 15-04-29 10:53 AM, Head-Rapson, David wrote:
Hi,
I’m after some guidance on how to get profile Java apps in CF, in order to get to the bottom of memory issues.
We have an app that’s crashing every few hours with OOM error, most likely it’s a memory leak.
I’d like to profile the JVM and work out what’s eating memory, however tools like yourkit require connectivity INTO the JVM server (i.e. the warden container), either via host / port or via SSH.
Since warden containers cannot be connected to on ports other than for HTTP and cannot be SSHd to, neither of these works for me.

I tried installed a standalone JDK onto the warden container, however as soon as I ran ‘jmap’ to invoke the dump, warden cleaned up the container – most likely for memory over-consumption.

I had previously found a hack in the Weblogic buildpack (https://github.com/pivotal-cf/weblogic-buildpack/blob/master/docs/container-wls-monitoring.md) for modifying the start script which, when used with –XX:HeapDumpOnOutOfMemoryError, should copy any heapdump files to a file share somewhere. I have my own custom buildpack so I could use something similar.
Has anyone got a better solution than this?

We would love to use newrelic / app dynamics for this however we’re not allowed. And I’m not 100% certain they could help with this either.

Dave

The information transmitted is intended for the person or entity to which it is addressed and may contain confidential, privileged or copyrighted material. If you receive this in error, please contact the sender and delete the material from any computer. Fidelity only gives information on products and services and does not give investment advice to retail clients based on individual circumstances. Any comments or statements made are not necessarily those of Fidelity. All e-mails may be monitored. FIL Investments International (Reg. No.1448245), FIL Investment Services (UK) Limited (Reg. No. 2016555), FIL Pensions Management (Reg. No. 2015142) and Financial Administration Services Limited (Reg. No. 1629709) are authorised and regulated in the UK by the Financial Conduct Authority. FIL Life Insurance Limited (Reg No. 3406905) is authorised in the UK by the Prudential Regulation Authority and regulated in the UK by the Financial Conduct Authority and the Prudential Regulation Authority. Registered offices at Oakhill House, 130 Tonbridge Road, Hildenborough, Tonbridge, Kent TN11 9DZ.
--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com<https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com?utm_medium=email&utm_source=footer>.
To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+unsubscribe(a)cloudfoundry.org<mailto:vcap-dev+unsubscribe(a)cloudfoundry.org>.


_______________________________________________
Cf-dev mailing list
Cf-dev(a)lists.cloudfoundry.org<mailto:Cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



--
Regards,

Daniel Jones
EngineerBetter.com

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.