Re: How random is Metron's Doppler selection?

Actually, this might explain why some of our customers are so frustrated
trying to read their stack traces in Splunk. :\

So each line of a stack trace could go to a different Doppler. That means
each line of the stack trace goes out to a different syslog drain making it
impossible to consolidate that stack trace into a single logging event when
passed off to a third-party logging system like Splunk. This sucks. To be
fair, Splunk has never been any good at dealing with stack traces.

What are the possibilities of getting some kind of optionally enabled
application instance affinity put into Metron? (I know. I know. I can
submit a PR.)


Oops, wrong link. Should be

Sorry about that!

Metron chooses a randomly-available Doppler for each message
<>. Availability
prefers same-zone Doppler servers:

- If a Metron instance knows about any same-zone Dopplers, it chooses
one at random for each message.
- If no same-zone Dopplers are present, the random choice is made
from the list of all known servers.

In fact, the behavior you describe is the behavior of DEA Logging Agent
before Metron existed. What we discovered with that approach is that it
balances load very unfairly, as a single high-volume app is stuck on one
server. While the "new" mechanism does not guarantee consistency, it does
enable the Doppler pool to more-evenly share load.

If you're seeing that a single app instance is routed to the same Doppler
server every time, then (without further information) I would guess that
you're either running a single Doppler instance in each availability zone,
or your deck is stacked. :-) If neither of those is true and you're still
observing that Metron routes messages from an app instance to a single
Doppler, I'd love to investigate how that is happening.

Metron's documentation [1] says "All Metron traffic is randomly
distributed across available Dopplers." How random is this? Based on
observation, it appears that logs for an individual application instance
are consistently sent to the same Doppler instance. The consistency aspect
is very important for us so that our Syslog forwarder can consolidate stack
traces into a single logging event.

How random is this distribution really for an application instance's


1 -

