Re: Bizarre DEA + Spring Behaviour

Daniel Mikusa

On Tue, Aug 11, 2015 at 5:15 AM, Daniel Jones <
daniel.jones(a)> wrote:

Hi all,

I've witnessed behaviour caused by the combination of a DEA and a Spring
application that I can't explain. If you like a good mystery or you happen
to know a lot about Java proxies and DEA transient state, please read on!

A particular Spring app

Version of Spring? What parts of Spring are you pulling into the app?

was crashing only on specific DEAs in a Cloud Foundry.
Ever try bumping up the log level for Spring when you were getting the
problem? If so, did the problem still occur? Were you able to capture the

All DEAs were from the same CF release (PCF ERT 1.5.2)
All DEAs were up-to-date according to BOSH (ie no outstanding changes
waiting to be applied)
All DEAs were deployed with identical BOSH job config
All Warden containers were using the same root FS
lucid64 or cflinuxfs2? or didn't matter?

The droplet was the same across all DEAs
The droplet version was the same
The droplet tarballs all had the same MD5 checksum
What was the output of the Java build pack when the droplet was created?
or better yet, run `cf files <app> app/.java-buildpack.log` and include
the output.

Warden was providing the exact same env and start command to all containers
I saw the same behaviour repeat itself across 5 completely separate Cloud
Foundry installations

The crash was Spring not being able to autowire a bean, where it was
referenced by implementation rather than interface (yes, I know, but it was
not my code!).

Any chance you could include logs from the crash? Was there an exception /
stacktrace generated? Alternatively, have you been able to create a simple
test app that replicates the behavior?

There was some Javassist/CGLIB action going on, creating proxies for the
sake of transaction management.

Rebooting the troublesome DEAs did not fix the problem.

Doing a `bosh recreate` did reliably fix the problem.

Alternatively, changing the Spring code to wire by interface also reliably
fixed the problem.

I can't understand why different DEA instances, from the same BOSH
release, with the same config, on the same stemcell, running the same
version of Warden, with the same droplet, and the same root FS, and the
same env, and the same start command, yielded different behaviour. I'm even
further confused as to why a `bosh recreate` changed that behaviour. What
could possibly have changed? Something on ephemeral disk? But what else is
there on ephemeral disk that could have mattered and was likely to have

How much was on the disk? Was it getting full? How many other apps were
running on that DEA (before vs after)?

Do CGLIB/Javassist have some native dependencies that weren't in sync
between DEAs?

Anyone with a convincing explanation (that does not involve voodoo) will
receive one free beer and a high-five at the next CF Summit!
Wild guess, race condition in the code somewhere?


Join { to automatically receive all group messages.