Re: [vcap-dev] Java OOM debugging


Lari Hotari <Lari@...>
 

Hi,

I created a few tools to debug OOM problems since the application I was
responsible for running on CF was failing constantly because of OOM
problems. The problems I had, turned out not to be actual memory leaks
in the Java application.

In the "cf events appname" log I would get entries like this:
2015-xx-xxTxx:xx:xx.00-0400 app.crash appname index:
1, reason: CRASHED, exit_description: out of memory, exit_status: 255

These type of entries are produced when the container goes over it's
memory resource limits. It doesn't mean that there is a memory leak in
the Java application. The container gets killed by the Linux kernel oom
killer
(https://github.com/cloudfoundry/warden/blob/master/warden/README.md#limit-handle-mem-value)
based on the resource limits set to the warden container.
The memory limit is specified in number of bytes. It is enforced using
the control group associated with the container. When a container
exceeds this limit, one or more of its processes will be killed by the
kernel. Additionally, the Warden will be notified that an OOM happened
and it subsequently tears down the container.
In my case it never got killed by the killjava.sh script that gets
called in the java-buildpack when an OOM happens in Java.

This is the tool I built to debug the problems:
https://github.com/lhotari/java-buildpack-diagnostics-app
I deployed that app as part of the forked buildpack I'm using.
Please read the readme about what it's limitations are. It worked for
me, but it might not work for you. It's opensource and you can fork it. :)

There is a solution in my toolcase for creating a heapdump and uploading
that to S3:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy
The readme explains how to setup Amazon S3 keys for this:
https://github.com/lhotari/java-buildpack-diagnostics-app#amazon-s3-setup
Once you get a dump, you can then analyse the dump in a java profiler
tool like YourKit.

I also have a solution that forks the java-buildpack modifies
killjava.sh and adds a script that uploads the heapdump to S3 in the
case of OOM:
https://github.com/lhotari/java-buildpack/commit/2d654b80f3bf1a0e0f1bae4f29cb85f56f5f8c46

In java-buildpack-diagnostics-app I have also other tools for getting
Linux operation system specific memory information, for example:

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemorySmapServlet.groovy
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MallocInfoServlet.groovy

These tools are handy for looking at details of the Java process RSS
memory usage growth.

There is also a solution for getting ssh shell access inside your
application with tmate.io:
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/TmateSshServlet.groovy
(this version is only compatible with the new "cflinuxfs2" stack)

It looks like there are serious problems on CloudFoundry with the memory
sizing calculation. An application that doesn't have a OOM problem will
get killed by the oom killer because the Java process will go over the
memory limits.
I filed this issue:
https://github.com/cloudfoundry/java-buildpack/issues/157 , but that
might not cover everything.

The workaround for that in my case was to add a native key under
memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is
for a 2GB total memory).
see example
https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory
bounds. I'm sure there is now also a way to get the keys without forking
the buildpack. I could have also adjusted the percentage portions, but I
wanted to set a hard minimum for this case.

It was also required to do some other tuning.

I added this to JAVA_OPTS:
-XX:CompressedClassSpaceSize=256M -XX:InitialCodeCacheSize=64M
-XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M
-XX:ReservedCodeCacheSize=200M -XX:MinMetaspaceExpansion=1M
-XX:MaxMetaspaceExpansion=8M -XX:MaxDirectMemorySize=96M
while trying to keep the Java process from growing in RSS memory size.

The memory overhead of a 64 bit Java process on Linux can be reduced by
specifying these environment variables:

stack: cflinuxfs2
.
.
.
env:
MALLOC_ARENA_MAX: 2
MALLOC_MMAP_THRESHOLD_: 131072
MALLOC_TRIM_THRESHOLD_: 131072
MALLOC_TOP_PAD_: 131072
MALLOC_MMAP_MAX_: 65536

MALLOC_ARENA_MAX works only on cflinuxfs2 stack (the lucid64 stack has a
buggy version of glibc).

explanation about MALLOC_ARENA_MAX from Heroku:
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
some measurement data how it reduces memory consumption:
https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

I have created a PR to add this to CF java-buildpack:
https://github.com/cloudfoundry/java-buildpack/pull/160

I also created an issues
https://github.com/cloudfoundry/java-buildpack/issues/163 and
https://github.com/cloudfoundry/java-buildpack/pull/159 .

I hope this information helps others struggling with OOM problems in CF.
I'm not saying that this is a ready made solution just for you. YMMV. It
worked for me.

-Lari



On 15-04-29 10:53 AM, Head-Rapson, David wrote:

Hi,

I’m after some guidance on how to get profile Java apps in CF, in
order to get to the bottom of memory issues.

We have an app that’s crashing every few hours with OOM error, most
likely it’s a memory leak.

I’d like to profile the JVM and work out what’s eating memory, however
tools like yourkit require connectivity INTO the JVM server (i.e. the
warden container), either via host / port or via SSH.

Since warden containers cannot be connected to on ports other than for
HTTP and cannot be SSHd to, neither of these works for me.



I tried installed a standalone JDK onto the warden container, however
as soon as I ran ‘jmap’ to invoke the dump, warden cleaned up the
container – most likely for memory over-consumption.



I had previously found a hack in the Weblogic buildpack
(https://github.com/pivotal-cf/weblogic-buildpack/blob/master/docs/container-wls-monitoring.md)
for modifying the start script which, when used with
–XX:HeapDumpOnOutOfMemoryError, should copy any heapdump files to a
file share somewhere. I have my own custom buildpack so I could use
something similar.

Has anyone got a better solution than this?



We would love to use newrelic / app dynamics for this however we’re
not allowed. And I’m not 100% certain they could help with this either.



Dave



The information transmitted is intended for the person or entity to
which it is addressed and may contain confidential, privileged or
copyrighted material. If you receive this in error, please contact the
sender and delete the material from any computer. Fidelity only gives
information on products and services and does not give investment
advice to retail clients based on individual circumstances. Any
comments or statements made are not necessarily those of Fidelity. All
e-mails may be monitored. FIL Investments International (Reg.
No.1448245), FIL Investment Services (UK) Limited (Reg. No. 2016555),
FIL Pensions Management (Reg. No. 2015142) and Financial
Administration Services Limited (Reg. No. 1629709) are authorised and
regulated in the UK by the Financial Conduct Authority. FIL Life
Insurance Limited (Reg No. 3406905) is authorised in the UK by the
Prudential Regulation Authority and regulated in the UK by the
Financial Conduct Authority and the Prudential Regulation Authority.
Registered offices at Oakhill House, 130 Tonbridge Road,
Hildenborough, Tonbridge, Kent TN11 9DZ.

--
You received this message because you are subscribed to the Google
Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit
https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com
<https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com?utm_medium=email&utm_source=footer>.
To unsubscribe from this group and stop receiving emails from it, send
an email to vcap-dev+unsubscribe(a)cloudfoundry.org
<mailto:vcap-dev+unsubscribe(a)cloudfoundry.org>.

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.