Logstash and Multiline Log Entry


Steve Wall
 

Hello,
We are sending CF logs message to an ELK stack. Multiline logs message are
broken out into several log messages in Logstash. One end per line of the
multiline log message. This is problematic when stack traces dumped to the
log. Each line of the stack trace is translated into a log message. Trying
to view this through Kibana is nearly impossible. Logstash provides a Grok
feature allowing for the manipulation of the log messages. One common
solution is to create a Grok filter that using a timestamp to indicate when
a log entry starts and to combine all lines until the next timestamp into
one log message. The problem is that CF adds a timestamp to every line. Has
anyone come up with a good Grok expression to handle multiline log message
coming out of CF?
Thanks!
Steve


Simon Johansson <simon@...>
 

This is a tricky one. Especially if you have more than one logstash machine
doing filtering as they will do filtering independently of each other as
the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via GELF or
with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com> wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs message are
broken out into several log messages in Logstash. One end per line of the
multiline log message. This is problematic when stack traces dumped to the
log. Each line of the stack trace is translated into a log message. Trying
to view this through Kibana is nearly impossible. Logstash provides a Grok
feature allowing for the manipulation of the log messages. One common
solution is to create a Grok filter that using a timestamp to indicate when
a log entry starts and to combine all lines until the next timestamp into
one log message. The problem is that CF adds a timestamp to every line. Has
anyone come up with a good Grok expression to handle multiline log message
coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Mike Youngstrom <youngm@...>
 

Yet another request for improved multi line log message handling. Is there
any update from the LAMB team on plans to improve this problem? There have
been several proposed solutions but I'm not aware of anything actually
making it into the LAMB tracker. It would be great if we could hear from
Erik on this issue. Does the LAMB team believe it is not an issue? Are
there plans to improve this situation? Whatever the perspective lets
discuss it as a community and see if there are any options better than the
current. I'd really like to see something turned into a tracker issue if
there are better options.

Mike

[0] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson <simon(a)simonjohansson.com>
wrote:

This is a tricky one. Especially if you have more than one logstash
machine doing filtering as they will do filtering independently of each
other as the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via GELF or
with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com>
wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs message
are broken out into several log messages in Logstash. One end per line of
the multiline log message. This is problematic when stack traces dumped to
the log. Each line of the stack trace is translated into a log message.
Trying to view this through Kibana is nearly impossible. Logstash provides
a Grok feature allowing for the manipulation of the log messages. One
common solution is to create a Grok filter that using a timestamp to
indicate when a log entry starts and to combine all lines until the next
timestamp into one log message. The problem is that CF adds a timestamp to
every line. Has anyone come up with a good Grok expression to handle
multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Steve Wall <steve.wall@...>
 

On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Yet another request for improved multi line log message handling. Is
there any update from the LAMB team on plans to improve this problem?
There have been several proposed solutions but I'm not aware of anything
actually making it into the LAMB tracker. It would be great if we could
hear from Erik on this issue. Does the LAMB team believe it is not an
issue? Are there plans to improve this situation? Whatever the
perspective lets discuss it as a community and see if there are any options
better than the current. I'd really like to see something turned into a
tracker issue if there are better options.

Mike

[0] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson <simon(a)simonjohansson.com
wrote:
This is a tricky one. Especially if you have more than one logstash
machine doing filtering as they will do filtering independently of each
other as the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via GELF or
with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com>
wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs message
are broken out into several log messages in Logstash. One end per line of
the multiline log message. This is problematic when stack traces dumped to
the log. Each line of the stack trace is translated into a log message.
Trying to view this through Kibana is nearly impossible. Logstash provides
a Grok feature allowing for the manipulation of the log messages. One
common solution is to create a Grok filter that using a timestamp to
indicate when a log entry starts and to combine all lines until the next
timestamp into one log message. The problem is that CF adds a timestamp to
every line. Has anyone come up with a good Grok expression to handle
multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Mike Youngstrom <youngm@...>
 

Thanks Steve. Though I'm no logstash expert I assume this won't work if
you have multiple logstash machine's doing filtering like Simon mentioned
right? Same is true for us with splunk if you are forwarding logs to more
than one indexer via the REST api. I'd still like to have a discussion
with Erik about this problem see if he thinks there is anything that can be
done in loggregator to help.

Mike

On Wed, Jul 29, 2015 at 9:00 AM, Steve Wall <
steve.wall(a)primetimesoftware.com> wrote:

Here's a suggested pattern to handle stack traces.


http://stackoverflow.com/questions/31657863/logstash-and-multiline-log-entry-from-cloud-foundry?noredirect=1#comment51279061_31657863


On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yet another request for improved multi line log message handling. Is
there any update from the LAMB team on plans to improve this problem?
There have been several proposed solutions but I'm not aware of anything
actually making it into the LAMB tracker. It would be great if we could
hear from Erik on this issue. Does the LAMB team believe it is not an
issue? Are there plans to improve this situation? Whatever the
perspective lets discuss it as a community and see if there are any options
better than the current. I'd really like to see something turned into a
tracker issue if there are better options.

Mike

[0] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson <
simon(a)simonjohansson.com> wrote:

This is a tricky one. Especially if you have more than one logstash
machine doing filtering as they will do filtering independently of each
other as the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via GELF
or with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com>
wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs message
are broken out into several log messages in Logstash. One end per line of
the multiline log message. This is problematic when stack traces dumped to
the log. Each line of the stack trace is translated into a log message.
Trying to view this through Kibana is nearly impossible. Logstash provides
a Grok feature allowing for the manipulation of the log messages. One
common solution is to create a Grok filter that using a timestamp to
indicate when a log entry starts and to combine all lines until the next
timestamp into one log message. The problem is that CF adds a timestamp to
every line. Has anyone come up with a good Grok expression to handle
multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Steve Wall <steve.wall@...>
 

Now I see what that means. Each line of a multiline log message could be
sent to a different logstash server. Definitely problematic. Especially
with the ephemeral nature of the CF logs there needs to be a viable
solution to persist the logs and syslog seems to be a natural solution. I'm
located in Denver and attend the local CF meetups held in the Pivotal
offices. I believe some LAMB devs attend. I'll be sure to bring it up with
them.
-Steve

On Wed, Jul 29, 2015 at 9:47 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Steve. Though I'm no logstash expert I assume this won't work if
you have multiple logstash machine's doing filtering like Simon mentioned
right? Same is true for us with splunk if you are forwarding logs to more
than one indexer via the REST api. I'd still like to have a discussion
with Erik about this problem see if he thinks there is anything that can be
done in loggregator to help.

Mike

On Wed, Jul 29, 2015 at 9:00 AM, Steve Wall <
steve.wall(a)primetimesoftware.com> wrote:

Here's a suggested pattern to handle stack traces.


http://stackoverflow.com/questions/31657863/logstash-and-multiline-log-entry-from-cloud-foundry?noredirect=1#comment51279061_31657863


On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yet another request for improved multi line log message handling. Is
there any update from the LAMB team on plans to improve this problem?
There have been several proposed solutions but I'm not aware of anything
actually making it into the LAMB tracker. It would be great if we could
hear from Erik on this issue. Does the LAMB team believe it is not an
issue? Are there plans to improve this situation? Whatever the
perspective lets discuss it as a community and see if there are any options
better than the current. I'd really like to see something turned into a
tracker issue if there are better options.

Mike

[0] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson <
simon(a)simonjohansson.com> wrote:

This is a tricky one. Especially if you have more than one logstash
machine doing filtering as they will do filtering independently of each
other as the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via GELF
or with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com>
wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs message
are broken out into several log messages in Logstash. One end per line of
the multiline log message. This is problematic when stack traces dumped to
the log. Each line of the stack trace is translated into a log message.
Trying to view this through Kibana is nearly impossible. Logstash provides
a Grok feature allowing for the manipulation of the log messages. One
common solution is to create a Grok filter that using a timestamp to
indicate when a log entry starts and to combine all lines until the next
timestamp into one log message. The problem is that CF adds a timestamp to
every line. Has anyone come up with a good Grok expression to handle
multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Erik Jasiak
 

Hi Steve and Simon; hello again Mike,

First, apologies for the delay in reply on this one- I've also
been trying to come up with a simple, short answer to this problem. I
failed.

Here are the high-level, non-technical answers:
1) Yes, we'd love to enable multi-line logging. Regardless of any
other challenges, we know that there's interest.
2) The problem is multi-layered, and extends beyond loggregator.
2a) Most of the problems with multi-line logging that overlap
loggregator also overlap "general scalability" - problems we've been
handling as part of moving toward collector retirement.
3) We have a hack day project looking at anything "quick and dirty"
to help fix this.
4) Redirecting app logs have known workarounds (eg in Java: via
log4j or similar) while we tackled this - not preferred at all, but do-able.

#########

Technical answers: Loggregator's goals are "Fast, thorough, dumb."
Multi-line logging - as handled by loggregator - has no clean way of
working at the moment w/o violating "fast" or "dumb" principles today.
We're getting there though.

Here's how we've been working towards a fix:
* Syslog drains were not performant enough, or could not handle
large java traces - something we recently fixed[1][2][3] and are going
to email about separately.
* Horizontal scalability allows for overall better performance
and reliability, but pushes the cost on data consistency to the edges of
loggregator (hence nozzles, injectors.)
* Loggregator's dropsonde protocol didn't allow for a clean way
to enforce/tag multi-line data consistency - something we are about to
put forward a proposal to remedy.
** Timestamps are not a clean mechanism for reliably
re-assembling a multi-line log - some combination of app-instance and
order-of-output would need to be tacked on, or a decent vector-time
implementation. We'd need a way to add this metadata that would allow
for re-assembly (see protocol item above). We'd also have to add extra
info at DEA or garden without sacrificing performance - and we know that
the DEA logging agent today already has questions around "acceptable"
performance.

So a multiline fix intersects our goals today. I will do my best
to highlight stories that help us with multi-line logging, and we need
to do a better job at communicating that we're working toward it, even
if it's not the obvious target goal.

,
Erik

[1] https://www.pivotaltracker.com/story/show/99494586
[2] https://www.pivotaltracker.com/story/show/97928938
[3] https://www.pivotaltracker.com/story/show/100163298

Steve Wall wrote:

Now I see what that means. Each line of a multiline log message could
be sent to a different logstash server. Definitely problematic.
Especially with the ephemeral nature of the CF logs there needs to be
a viable solution to persist the logs and syslog seems to be a natural
solution. I'm located in Denver and attend the local CF meetups held
in the Pivotal offices. I believe some LAMB devs attend. I'll be sure
to bring it up with them.
-Steve

On Wed, Jul 29, 2015 at 9:47 AM, Mike Youngstrom <youngm(a)gmail.com
<mailto:youngm(a)gmail.com>> wrote:

Thanks Steve. Though I'm no logstash expert I assume this won't
work if you have multiple logstash machine's doing filtering like
Simon mentioned right? Same is true for us with splunk if you are
forwarding logs to more than one indexer via the REST api. I'd
still like to have a discussion with Erik about this problem see
if he thinks there is anything that can be done in loggregator to
help.

Mike

On Wed, Jul 29, 2015 at 9:00 AM, Steve Wall
<steve.wall(a)primetimesoftware.com
<mailto:steve.wall(a)primetimesoftware.com>> wrote:

Here's a suggested pattern to handle stack traces.

http://stackoverflow.com/questions/31657863/logstash-and-multiline-log-entry-from-cloud-foundry?noredirect=1#comment51279061_31657863


On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom
<youngm(a)gmail.com <mailto:youngm(a)gmail.com>> wrote:

Yet another request for improved multi line log message
handling. Is there any update from the LAMB team on plans
to improve this problem? There have been several proposed
solutions but I'm not aware of anything actually making it
into the LAMB tracker. It would be great if we could hear
from Erik on this issue. Does the LAMB team believe it is
not an issue? Are there plans to improve this situation?
Whatever the perspective lets discuss it as a community
and see if there are any options better than the current.
I'd really like to see something turned into a tracker
issue if there are better options.

Mike

[0]
http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1]
http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ
<https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#%21msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ>

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson
<simon(a)simonjohansson.com
<mailto:simon(a)simonjohansson.com>> wrote:

This is a tricky one. Especially if you have more than
one logstash machine doing filtering as they will do
filtering independently of each other as the events
come in.

The reason why CF adds a timestamp to each line is
because how syslog works, where each line is its own even.

What we tend to do in my company is to log this kind
of stuff via GELF or with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall
<stevewallone(a)gmail.com
<mailto:stevewallone(a)gmail.com>> wrote:

Hello,
We are sending CF logs message to an ELK stack.
Multiline logs message are broken out into several
log messages in Logstash. One end per line of the
multiline log message. This is problematic when
stack traces dumped to the log. Each line of the
stack trace is translated into a log message.
Trying to view this through Kibana is nearly
impossible. Logstash provides a Grok feature
allowing for the manipulation of the log messages.
One common solution is to create a Grok filter
that using a timestamp to indicate when a log
entry starts and to combine all lines until the
next timestamp into one log message. The problem
is that CF adds a timestamp to every line. Has
anyone come up with a good Grok expression to
handle multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<mailto:cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<mailto:cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<mailto:cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
<mailto:cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org <mailto:cf-dev(a)lists.cloudfoundry.org>
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Mike Youngstrom <youngm@...>
 

I replied a while ago but it appears the mailing list dropped it. Here is
another attempt.

Thanks for the response Erik.

{quote}
* Loggregator's dropsonde protocol didn't allow for a clean way to
enforce/tag multi-line data consistency - something we are about to put
forward a proposal to remedy.
** Timestamps are not a clean mechanism for reliably re-assembling a
multi-line log - some combination of app-instance and order-of-output would
need to be tacked on, or a decent vector-time implementation.
{quote}

It seems these features above are focused on breaking apart a multi-line
event into multiple events then re-assembling them somewhere else. I think
I've been looking at the issue from a different perspective. I've been
looking for ways to allow an application to get multiple lines into a
single Loggregator event. That would be good enough to solve my problems
without all the complexity of breaking up and reassembling a series of
events. Even if Loggregator had ways to add metadata to join multiple
events into 1 we are still faced with the problem of how the
dea_logging_agent knows when an app logs multiple stdout/err lines when
should those be considered a multi-line message.

Do you have any thoughts on how an app might hint to the logging agent that
a message is a multi-line message? It seems to me that is the real problem
I need solved. Or am I missing something?

Mike

On Tue, Aug 11, 2015 at 1:26 PM, Erik Jasiak <mjasiak(a)pivotal.io> wrote:

Hi Steve and Simon; hello again Mike,

First, apologies for the delay in reply on this one- I've also been
trying to come up with a simple, short answer to this problem. I failed.

Here are the high-level, non-technical answers:
1) Yes, we'd love to enable multi-line logging. Regardless of any
other challenges, we know that there's interest.
2) The problem is multi-layered, and extends beyond loggregator.
2a) Most of the problems with multi-line logging that overlap
loggregator also overlap "general scalability" - problems we've been
handling as part of moving toward collector retirement.
3) We have a hack day project looking at anything "quick and dirty" to
help fix this.
4) Redirecting app logs have known workarounds (eg in Java: via log4j
or similar) while we tackled this - not preferred at all, but do-able.

#########

Technical answers: Loggregator's goals are "Fast, thorough, dumb."
Multi-line logging - as handled by loggregator - has no clean way of
working at the moment w/o violating "fast" or "dumb" principles today.
We're getting there though.

Here's how we've been working towards a fix:
* Syslog drains were not performant enough, or could not handle
large java traces - something we recently fixed[1][2][3] and are going to
email about separately.
* Horizontal scalability allows for overall better performance and
reliability, but pushes the cost on data consistency to the edges of
loggregator (hence nozzles, injectors.)
* Loggregator's dropsonde protocol didn't allow for a clean way to
enforce/tag multi-line data consistency - something we are about to put
forward a proposal to remedy.
** Timestamps are not a clean mechanism for reliably re-assembling a
multi-line log - some combination of app-instance and order-of-output would
need to be tacked on, or a decent vector-time implementation. We'd need a
way to add this metadata that would allow for re-assembly (see protocol
item above). We'd also have to add extra info at DEA or garden without
sacrificing performance - and we know that the DEA logging agent today
already has questions around "acceptable" performance.

So a multiline fix intersects our goals today. I will do my best to
highlight stories that help us with multi-line logging, and we need to do a
better job at communicating that we're working toward it, even if it's not
the obvious target goal.

,
Erik

[1] https://www.pivotaltracker.com/story/show/99494586
[2] https://www.pivotaltracker.com/story/show/97928938
[3] https://www.pivotaltracker.com/story/show/100163298


Steve Wall wrote:

Now I see what that means. Each line of a multiline log message could be
sent to a different logstash server. Definitely problematic. Especially
with the ephemeral nature of the CF logs there needs to be a viable
solution to persist the logs and syslog seems to be a natural solution. I'm
located in Denver and attend the local CF meetups held in the Pivotal
offices. I believe some LAMB devs attend. I'll be sure to bring it up with
them.
-Steve

On Wed, Jul 29, 2015 at 9:47 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Steve. Though I'm no logstash expert I assume this won't work if
you have multiple logstash machine's doing filtering like Simon mentioned
right? Same is true for us with splunk if you are forwarding logs to more
than one indexer via the REST api. I'd still like to have a discussion
with Erik about this problem see if he thinks there is anything that can be
done in loggregator to help.

Mike

On Wed, Jul 29, 2015 at 9:00 AM, Steve Wall <
steve.wall(a)primetimesoftware.com> wrote:

Here's a suggested pattern to handle stack traces.


http://stackoverflow.com/questions/31657863/logstash-and-multiline-log-entry-from-cloud-foundry?noredirect=1#comment51279061_31657863


On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yet another request for improved multi line log message handling. Is
there any update from the LAMB team on plans to improve this problem?
There have been several proposed solutions but I'm not aware of anything
actually making it into the LAMB tracker. It would be great if we could
hear from Erik on this issue. Does the LAMB team believe it is not an
issue? Are there plans to improve this situation? Whatever the
perspective lets discuss it as a community and see if there are any options
better than the current. I'd really like to see something turned into a
tracker issue if there are better options.

Mike

[0]
http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1] http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson <
simon(a)simonjohansson.com> wrote:

This is a tricky one. Especially if you have more than one logstash
machine doing filtering as they will do filtering independently of each
other as the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via GELF
or with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com>
wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs
message are broken out into several log messages in Logstash. One end per
line of the multiline log message. This is problematic when stack traces
dumped to the log. Each line of the stack trace is translated into a log
message. Trying to view this through Kibana is nearly impossible. Logstash
provides a Grok feature allowing for the manipulation of the log messages.
One common solution is to create a Grok filter that using a timestamp to
indicate when a log entry starts and to combine all lines until the next
timestamp into one log message. The problem is that CF adds a timestamp to
every line. Has anyone come up with a good Grok expression to handle
multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing listcf-dev(a)lists.cloudfoundry.orghttps://lists.cloudfoundry.org/mailman/listinfo/cf-dev



Mike Youngstrom <youngm@...>
 

Bump, do you have any thoughts on my follow on questions below Erik?

Mike

On Fri, Aug 28, 2015 at 11:51 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

I replied a while ago but it appears the mailing list dropped it. Here is
another attempt.

Thanks for the response Erik.

{quote}
* Loggregator's dropsonde protocol didn't allow for a clean way to
enforce/tag multi-line data consistency - something we are about to put
forward a proposal to remedy.
** Timestamps are not a clean mechanism for reliably re-assembling a
multi-line log - some combination of app-instance and order-of-output would
need to be tacked on, or a decent vector-time implementation.
{quote}

It seems these features above are focused on breaking apart a multi-line
event into multiple events then re-assembling them somewhere else. I think
I've been looking at the issue from a different perspective. I've been
looking for ways to allow an application to get multiple lines into a
single Loggregator event. That would be good enough to solve my problems
without all the complexity of breaking up and reassembling a series of
events. Even if Loggregator had ways to add metadata to join multiple
events into 1 we are still faced with the problem of how the
dea_logging_agent knows when an app logs multiple stdout/err lines when
should those be considered a multi-line message.

Do you have any thoughts on how an app might hint to the logging agent
that a message is a multi-line message? It seems to me that is the real
problem I need solved. Or am I missing something?

Mike

On Tue, Aug 11, 2015 at 1:26 PM, Erik Jasiak <mjasiak(a)pivotal.io> wrote:

Hi Steve and Simon; hello again Mike,

First, apologies for the delay in reply on this one- I've also been
trying to come up with a simple, short answer to this problem. I failed.

Here are the high-level, non-technical answers:
1) Yes, we'd love to enable multi-line logging. Regardless of any
other challenges, we know that there's interest.
2) The problem is multi-layered, and extends beyond loggregator.
2a) Most of the problems with multi-line logging that overlap
loggregator also overlap "general scalability" - problems we've been
handling as part of moving toward collector retirement.
3) We have a hack day project looking at anything "quick and dirty"
to help fix this.
4) Redirecting app logs have known workarounds (eg in Java: via log4j
or similar) while we tackled this - not preferred at all, but do-able.

#########

Technical answers: Loggregator's goals are "Fast, thorough, dumb."
Multi-line logging - as handled by loggregator - has no clean way of
working at the moment w/o violating "fast" or "dumb" principles today.
We're getting there though.

Here's how we've been working towards a fix:
* Syslog drains were not performant enough, or could not handle
large java traces - something we recently fixed[1][2][3] and are going to
email about separately.
* Horizontal scalability allows for overall better performance and
reliability, but pushes the cost on data consistency to the edges of
loggregator (hence nozzles, injectors.)
* Loggregator's dropsonde protocol didn't allow for a clean way to
enforce/tag multi-line data consistency - something we are about to put
forward a proposal to remedy.
** Timestamps are not a clean mechanism for reliably re-assembling
a multi-line log - some combination of app-instance and order-of-output
would need to be tacked on, or a decent vector-time implementation. We'd
need a way to add this metadata that would allow for re-assembly (see
protocol item above). We'd also have to add extra info at DEA or garden
without sacrificing performance - and we know that the DEA logging agent
today already has questions around "acceptable" performance.

So a multiline fix intersects our goals today. I will do my best to
highlight stories that help us with multi-line logging, and we need to do a
better job at communicating that we're working toward it, even if it's not
the obvious target goal.

,
Erik

[1] https://www.pivotaltracker.com/story/show/99494586
[2] https://www.pivotaltracker.com/story/show/97928938
[3] https://www.pivotaltracker.com/story/show/100163298


Steve Wall wrote:

Now I see what that means. Each line of a multiline log message could be
sent to a different logstash server. Definitely problematic. Especially
with the ephemeral nature of the CF logs there needs to be a viable
solution to persist the logs and syslog seems to be a natural solution. I'm
located in Denver and attend the local CF meetups held in the Pivotal
offices. I believe some LAMB devs attend. I'll be sure to bring it up with
them.
-Steve

On Wed, Jul 29, 2015 at 9:47 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks Steve. Though I'm no logstash expert I assume this won't work if
you have multiple logstash machine's doing filtering like Simon mentioned
right? Same is true for us with splunk if you are forwarding logs to more
than one indexer via the REST api. I'd still like to have a discussion
with Erik about this problem see if he thinks there is anything that can be
done in loggregator to help.

Mike

On Wed, Jul 29, 2015 at 9:00 AM, Steve Wall <
steve.wall(a)primetimesoftware.com> wrote:

Here's a suggested pattern to handle stack traces.


http://stackoverflow.com/questions/31657863/logstash-and-multiline-log-entry-from-cloud-foundry?noredirect=1#comment51279061_31657863


On Mon, Jul 27, 2015 at 11:02 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yet another request for improved multi line log message handling. Is
there any update from the LAMB team on plans to improve this problem?
There have been several proposed solutions but I'm not aware of anything
actually making it into the LAMB tracker. It would be great if we could
hear from Erik on this issue. Does the LAMB team believe it is not an
issue? Are there plans to improve this situation? Whatever the
perspective lets discuss it as a community and see if there are any options
better than the current. I'd really like to see something turned into a
tracker issue if there are better options.

Mike

[0]
http://lists.cloudfoundry.org/pipermail/cf-dev/2015-June/000423.html
[1]
http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000083.html
[2]
https://groups.google.com/a/cloudfoundry.org/forum/?utm_medium=email&utm_source=footer#!msg/vcap-dev/B1W6_vO0oyo/84X1eAtFsKoJ

On Mon, Jul 27, 2015 at 9:47 AM, Simon Johansson <
simon(a)simonjohansson.com> wrote:

This is a tricky one. Especially if you have more than one logstash
machine doing filtering as they will do filtering independently of each
other as the events come in.

The reason why CF adds a timestamp to each line is because how syslog
works, where each line is its own even.

What we tend to do in my company is to log this kind of stuff via
GELF or with Sentry.

On Mon, Jul 27, 2015 at 5:41 PM, Steve Wall <stevewallone(a)gmail.com>
wrote:

Hello,
We are sending CF logs message to an ELK stack. Multiline logs
message are broken out into several log messages in Logstash. One end per
line of the multiline log message. This is problematic when stack traces
dumped to the log. Each line of the stack trace is translated into a log
message. Trying to view this through Kibana is nearly impossible. Logstash
provides a Grok feature allowing for the manipulation of the log messages.
One common solution is to create a Grok filter that using a timestamp to
indicate when a log entry starts and to combine all lines until the next
timestamp into one log message. The problem is that CF adds a timestamp to
every line. Has anyone come up with a good Grok expression to handle
multiline log message coming out of CF?
Thanks!
Steve



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing listcf-dev(a)lists.cloudfoundry.orghttps://lists.cloudfoundry.org/mailman/listinfo/cf-dev