Re: How random is Metron's Doppler selection?


John Tuley <jtuley@...>
 

I can't speculate as to the future of CF and whether or not a particular
feature will someday be included.

But I can suggest a workaround: aside from very paranoid application
security group settings, there should be nothing preventing your
application from sending syslog to your external drain already, since apps
can make outbound connections. Obviously, this wouldn't also go through
Loggregator, and so wouldn't be available on `cf logs`. But perhaps your
logging utility can be configured to send to both syslog and stdout/stderr
simultaneously?

– John Tuley

On Fri, Jun 12, 2015 at 2:40 PM, Mike Heath <elcapo(a)gmail.com> wrote:

That's fair.

I think Mike Youngstrom is right. All of our logging problems would go
away if our applications could talk syslog to Loggregator. Capturing stdout
and stderr is certainly convenient, but it's not great for dealing with
stack traces.

-Mike

On Fri, Jun 12, 2015 at 8:38 AM, John Tuley <jtuley(a)pivotal.io> wrote:

Mike,

I don't want to speak to the possibility, but I can explain why we
decided against app affinity. Basically, it comes down to sharding over a
dynamic pool. As Doppler instances come and go, Metron would need to
re-balance its affinity calculations. This becomes troublesome if you
assume that a single Doppler is responsible for each app (or app-instance),
including the recent history: does the old home of an app need to transfer
history to the new home? Or maybe a new server just picks up new apps, and
all the old mappings stay the same? We did some research into algorithms
for this sort of consistent hashing/sharding and determined that it would
be difficult to implement in the presence of distributed servers *and* distributed
clients.

Given that your goals don't include history, the problem becomes easier
for sure. But I'd (personally – not speaking for product leadership) be
wary of accepting a PR that only solved forward-rebalancing without
addressing the problem of historical data.

– John Tuley

On Thu, Jun 11, 2015 at 4:55 PM, Mike Heath <elcapo(a)gmail.com> wrote:

Actually, this might explain why some of our customers are so frustrated
trying to read their stack traces in Splunk. :\

So each line of a stack trace could go to a different Doppler. That
means each line of the stack trace goes out to a different syslog drain
making it impossible to consolidate that stack trace into a single logging
event when passed off to a third-party logging system like Splunk. This
sucks. To be fair, Splunk has never been any good at dealing with stack
traces.

What are the possibilities of getting some kind of optionally enabled
application instance affinity put into Metron? (I know. I know. I can
submit a PR.)

-Mike

On Thu, Jun 11, 2015 at 3:54 PM, John Tuley <jtuley(a)pivotal.io> wrote:

Oops, wrong link. Should be
https://github.com/cloudfoundry/loggregator/blob/develop/src/metron/main.go#L188-L197
.

Sorry about that!

– John Tuley

On Thu, Jun 11, 2015 at 3:36 PM, John Tuley <jtuley(a)pivotal.io> wrote:

Mike,

Metron chooses a randomly-available Doppler for each message
<https://www.pivotaltracker.com/story/show/96801752>. Availability
prefers same-zone Doppler servers:

- If a Metron instance knows about any same-zone Dopplers, it
chooses one at random for each message.
- If no same-zone Dopplers are present, the random choice is made
from the list of all known servers.


In fact, the behavior you describe is the behavior of DEA Logging
Agent before Metron existed. What we discovered with that approach is that
it balances load very unfairly, as a single high-volume app is stuck on one
server. While the "new" mechanism does not guarantee consistency, it does
enable the Doppler pool to more-evenly share load.

If you're seeing that a single app instance is routed to the same
Doppler server every time, then (without further information) I would guess
that you're either running a single Doppler instance in each availability
zone, or your deck is stacked. :-) If neither of those is true and you're
still observing that Metron routes messages from an app instance to a
single Doppler, I'd love to investigate how that is happening.

– John Tuley

On Thu, Jun 11, 2015 at 2:45 PM, Mike Heath <elcapo(a)gmail.com> wrote:

Metron's documentation [1] says "All Metron traffic is randomly
distributed across available Dopplers." How random is this? Based on
observation, it appears that logs for an individual application instance
are consistently sent to the same Doppler instance. The consistency aspect
is very important for us so that our Syslog forwarder can consolidate stack
traces into a single logging event.

How random is this distribution really for an application instance's
logs?

-Mike

1 -
https://github.com/cloudfoundry/loggregator/tree/develop/src/metron

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.