Re: [vcap-dev] Java OOM debugging
Lari Hotari <Lari@...>
Hi,
I created a few tools to debug OOM problems since the application I was responsible for running on CF was failing constantly because of OOM problems. The problems I had, turned out not to be actual memory leaks in the Java application. In the "cf events appname" log I would get entries like this: 2015-xx-xxTxx:xx:xx.00-0400 app.crash appname index: 1, reason: CRASHED, exit_description: out of memory, exit_status: 255 These type of entries are produced when the container goes over it's memory resource limits. It doesn't mean that there is a memory leak in the Java application. The container gets killed by the Linux kernel oom killer (https://github.com/cloudfoundry/warden/blob/master/warden/README.md#limit-handle-mem-value) based on the resource limits set to the warden container. The memory limit is specified in number of bytes. It is enforced usingIn my case it never got killed by the killjava.sh script that gets called in the java-buildpack when an OOM happens in Java. This is the tool I built to debug the problems: https://github.com/lhotari/java-buildpack-diagnostics-app I deployed that app as part of the forked buildpack I'm using. Please read the readme about what it's limitations are. It worked for me, but it might not work for you. It's opensource and you can fork it. :) There is a solution in my toolcase for creating a heapdump and uploading that to S3: https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy The readme explains how to setup Amazon S3 keys for this: https://github.com/lhotari/java-buildpack-diagnostics-app#amazon-s3-setup Once you get a dump, you can then analyse the dump in a java profiler tool like YourKit. I also have a solution that forks the java-buildpack modifies killjava.sh and adds a script that uploads the heapdump to S3 in the case of OOM: https://github.com/lhotari/java-buildpack/commit/2d654b80f3bf1a0e0f1bae4f29cb85f56f5f8c46 In java-buildpack-diagnostics-app I have also other tools for getting Linux operation system specific memory information, for example: https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemorySmapServlet.groovy https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MallocInfoServlet.groovy These tools are handy for looking at details of the Java process RSS memory usage growth. There is also a solution for getting ssh shell access inside your application with tmate.io: https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/TmateSshServlet.groovy (this version is only compatible with the new "cflinuxfs2" stack) It looks like there are serious problems on CloudFoundry with the memory sizing calculation. An application that doesn't have a OOM problem will get killed by the oom killer because the Java process will go over the memory limits. I filed this issue: https://github.com/cloudfoundry/java-buildpack/issues/157 , but that might not cover everything. The workaround for that in my case was to add a native key under memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a 2GB total memory). see example https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25 that was how I got the app I'm running on CF to stay within the memory bounds. I'm sure there is now also a way to get the keys without forking the buildpack. I could have also adjusted the percentage portions, but I wanted to set a hard minimum for this case. It was also required to do some other tuning. I added this to JAVA_OPTS: -XX:CompressedClassSpaceSize=256M -XX:InitialCodeCacheSize=64M -XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M -XX:ReservedCodeCacheSize=200M -XX:MinMetaspaceExpansion=1M -XX:MaxMetaspaceExpansion=8M -XX:MaxDirectMemorySize=96M while trying to keep the Java process from growing in RSS memory size. The memory overhead of a 64 bit Java process on Linux can be reduced by specifying these environment variables: stack: cflinuxfs2 . . . env: MALLOC_ARENA_MAX: 2 MALLOC_MMAP_THRESHOLD_: 131072 MALLOC_TRIM_THRESHOLD_: 131072 MALLOC_TOP_PAD_: 131072 MALLOC_MMAP_MAX_: 65536 MALLOC_ARENA_MAX works only on cflinuxfs2 stack (the lucid64 stack has a buggy version of glibc). explanation about MALLOC_ARENA_MAX from Heroku: https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior some measurement data how it reduces memory consumption: https://devcenter.heroku.com/articles/testing-cedar-14-memory-use I have created a PR to add this to CF java-buildpack: https://github.com/cloudfoundry/java-buildpack/pull/160 I also created an issues https://github.com/cloudfoundry/java-buildpack/issues/163 and https://github.com/cloudfoundry/java-buildpack/pull/159 . I hope this information helps others struggling with OOM problems in CF. I'm not saying that this is a ready made solution just for you. YMMV. It worked for me. -Lari On 15-04-29 10:53 AM, Head-Rapson, David wrote:
|
|