Re: Logstash and Multiline Log Entry

Erik Jasiak

Hi Steve and Simon; hello again Mike,

First, apologies for the delay in reply on this one- I've also
been trying to come up with a simple, short answer to this problem. I

Here are the high-level, non-technical answers:
1) Yes, we'd love to enable multi-line logging. Regardless of any
other challenges, we know that there's interest.
2) The problem is multi-layered, and extends beyond loggregator.
2a) Most of the problems with multi-line logging that overlap
loggregator also overlap "general scalability" - problems we've been
handling as part of moving toward collector retirement.
3) We have a hack day project looking at anything "quick and dirty"
to help fix this.
4) Redirecting app logs have known workarounds (eg in Java: via
log4j or similar) while we tackled this - not preferred at all, but do-able.


Technical answers: Loggregator's goals are "Fast, thorough, dumb."
Multi-line logging - as handled by loggregator - has no clean way of
working at the moment w/o violating "fast" or "dumb" principles today.
We're getting there though.

Here's how we've been working towards a fix:
* Syslog drains were not performant enough, or could not handle
large java traces - something we recently fixed[1][2][3] and are going
to email about separately.
* Horizontal scalability allows for overall better performance
and reliability, but pushes the cost on data consistency to the edges of
loggregator (hence nozzles, injectors.)
* Loggregator's dropsonde protocol didn't allow for a clean way
to enforce/tag multi-line data consistency - something we are about to
put forward a proposal to remedy.
** Timestamps are not a clean mechanism for reliably
re-assembling a multi-line log - some combination of app-instance and
order-of-output would need to be tacked on, or a decent vector-time
implementation. We'd need a way to add this metadata that would allow
for re-assembly (see protocol item above). We'd also have to add extra
info at DEA or garden without sacrificing performance - and we know that
the DEA logging agent today already has questions around "acceptable"

So a multiline fix intersects our goals today. I will do my best
to highlight stories that help us with multi-line logging, and we need
to do a better job at communicating that we're working toward it, even
if it's not the obvious target goal.



Steve Wall wrote:

Now I see what that means. Each line of a multiline log message could
be sent to a different logstash server. Definitely problematic.
Especially with the ephemeral nature of the CF logs there needs to be
a viable solution to persist the logs and syslog seems to be a natural
solution. I'm located in Denver and attend the local CF meetups held
in the Pivotal offices. I believe some LAMB devs attend. I'll be sure
to bring it up with them.

On Wed, Jul 29, 2015 at 9:47 AM, Mike Youngstrom <youngm(a)
<mailto:youngm(a)>> wrote:

Thanks Steve. Though I'm no logstash expert I assume this won't
work if you have multiple logstash machine's doing filtering like
Simon mentioned right? Same is true for us with splunk if you are
forwarding logs to more than one indexer via the REST api. I'd
still like to have a discussion with Erik about this problem see
if he thinks there is anything that can be done in loggregator to


On Wed, Jul 29, 2015 at 9:00 AM, Steve Wall
<mailto:steve.wall(a)>> wrote:

Here's a suggested pattern to handle stack traces.

On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom
<youngm(a) <mailto:youngm(a)>> wrote:

Yet another request for improved multi line log message
handling. Is there any update from the LAMB team on plans
to improve this problem? There have been several proposed
solutions but I'm not aware of anything actually making it
into the LAMB tracker. It would be great if we could hear
from Erik on this issue. Does the LAMB team believe it is
not an issue? Are there plans to improve this situation?
Whatever the perspective lets discuss it as a community
and see if there are any options better than the current.
I'd really like to see something turned into a tracker
issue if there are better options.



On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson
<mailto:simon(a)>> wrote:

This is a tricky one. Especially if you have more than
one logstash machine doing filtering as they will do
filtering independently of each other as the events
come in.

The reason why CF adds a timestamp to each line is
because how syslog works, where each line is its own even.

What we tend to do in my company is to log this kind
of stuff via GELF or with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall
<mailto:stevewallone(a)>> wrote:

We are sending CF logs message to an ELK stack.
Multiline logs message are broken out into several
log messages in Logstash. One end per line of the
multiline log message. This is problematic when
stack traces dumped to the log. Each line of the
stack trace is translated into a log message.
Trying to view this through Kibana is nearly
impossible. Logstash provides a Grok feature
allowing for the manipulation of the log messages.
One common solution is to create a Grok filter
that using a timestamp to indicate when a log
entry starts and to combine all lines until the
next timestamp into one log message. The problem
is that CF adds a timestamp to every line. Has
anyone come up with a good Grok expression to
handle multiline log message coming out of CF?

cf-dev mailing list

cf-dev mailing list

cf-dev mailing list

cf-dev mailing list

cf-dev mailing list
cf-dev(a) <mailto:cf-dev(a)>

cf-dev mailing list

Join { to automatically receive all group messages.