Re: Doppler/Firehose - Multiline Log Entry


Eric Malm <emalm@...>
 

Thanks, Mike, don't be too hard on yourself. :) I think that's a valid
point that if there is processing to translate a byte-stream from process
stdout/stderr to loggregator log events, it would be beneficial to make
that as efficient as possible. Right now, the executor's stream-destination
type has all of that logic: it breaks the byte-stream into messages on
newlines and carriage returns, it breaks too-long messages on UTF-8
boundaries, and then it emits those messages to dropsonde with the
app-specific tagging. So from the standpoint of efficiency and coherence,
it probably would make sense to do this proposed \u2028-to-\n substitution
in the same single-pass as the other processing.

That said, the Diego team has gotten some of that processing wrong in the
executor: we've been breaking log messages at 4KiB when loggregator can
handle messages close to 64KiB (whatever ends up being the limit imposed by
the size of a UDP datagram and the additional fields in the event
envelope), and we just fixed a bug where we could break 3- and 4-byte UTF-8
sequences incorrectly at those message-length boundaries. So I also think
it could make sense for the dropsonde library to provide a type that does
the app-byte-stream-to-log-message processing that the executor's
stream-destination currently does. The executor could then instantiate a
pair of those and hook them up as io.Writer interfaces to the stdout and
stderr of the Garden process it invokes, as it does with its own
stream-destinations today. The stream-destinations could then also be
individually configured to do the substitution.

In any case, I think it makes sense to proceed with the Loggregator spike,
and then if we want to implement the solution for real the Diego and
Loggregator teams can figure out the best way to make it efficient and
maintainable.

Thanks,
Eric

On Wed, Apr 13, 2016 at 5:53 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

I'm an idiot. I see what you and Eric are saying now. Put the code in
Dropsonde then let the Executor simply initialize Dropsonde that way.
Works for me.

Thanks,
Mike

On Wed, Apr 13, 2016 at 5:26 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

My last 2 cents. It'll be configurable so will only be active in users of
dropsonde that want the functionality such as the Executor.

On Wed, Apr 13, 2016 at 5:21 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

You may want to reference the issue I created on executor.

In that issue I note that I don't think dropsonde is the right place to
do this token replacement because dropsonde doesn't know that the event
originally came through the limited stdout/stderr interface that needs this
functionality. However, executor does. If I'm using the dropsonde API
directly where I can safely put new line characters I don't want dropsonde
looking to replace a character I don't want replaced especially since that
character replacement isn't even needed when using a more rich interface
like dropsonde directly.

That's my 2 cents.

Thanks,
Mike

On Wed, Apr 13, 2016 at 4:34 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

We're going to look into it
<https://www.pivotaltracker.com/story/show/117583365>.

On Wed, Apr 13, 2016 at 12:33 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Thanks, Mike. If source-side processing is the right place to do
that \u2028-to-newline substitution, I think that there could also be a
config option on the dropsonde library to have its LogSender perform that
within each message before forwarding it on. The local metron-agent could
also do that processing. I think it's appropriate to push as much of that
log processing as possible to the Loggregator components and libraries:
it's already a bit much that the executor knows anything at all about the
content of the byte-streams that it receives from the stdout and stderr of
a process in the container, so that it can break those streams into the
log-lines that the dropsonde library expects.

Best,
Eric

On Wed, Apr 13, 2016 at 11:00 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the insight Jim. I still think that the Executor is the
place to fix this since multi line logging isn't a Loggregator limitation
it is a log inject limitation which is owned by the Executor. I'll open an
issue with Diego and see how it goes.

Thanks,
Mike

On Tue, Apr 12, 2016 at 2:51 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

That strategy is going to be hard to sell. Diego's Executor takes
the log lines out of Garden and drops them into dropsonde messages. I doubt
they'll think it's a good idea to implement substitution in that
processing. You can certainly ask Eric - he's very aware of the underlying
problem.

After that point, the Loggregator system does it's best to touch
messages as little as possible, and to improve performance and reliability,
we have thinking about the future that will lower the amount of touching
ever further. The next place that log message processing can be done is
either in a nozzle, or the injester of a log aggregator.

I'd vote for those downstream places - a single configuration and
algorithm instead of distributed across runner VMs.

On Tue, Apr 12, 2016 at 2:15 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I was thinking whoever demarcates and submits the original event to
loggregator. dea_logging_agent and the equivalent in Deigo. Doing it at
that point could provide a bit more flexibility. I know this isn't
necessarily the loggregator team's code but I think loggregator team buy
off would be important for those projects to accept such a PR.

Unless you can think of a better place to make that transformation
within the loggregator processing chain?

Mike

On Tue, Apr 12, 2016 at 2:02 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

what exactly do you mean by "event creation time"?

On Tue, Apr 12, 2016 at 1:57 PM, Mike Youngstrom <youngm(a)gmail.com
wrote:
Before I submit the CLI issue let me ask one more question.

Would it be better to replace the newline token with /n at event
creation time instead of asking the cli, splunk, anyone listening on the
firehose, etc. to do so?

The obvious downside is this would probably need to be a global
configuration. However, I know my organization wouldn't have a problem
swapping /u2028 with /n for a deployment. The feature would obviously be
off by default.

Thoughs?

Mike

On Tue, Apr 12, 2016 at 11:24 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Sounds good. I'll submit an issue to start the discussion. I
imagine the first question Dies will ask though is if you would support
something like that. :)

Mike

On Tue, Apr 12, 2016 at 11:12 AM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

cf logs
<https://github.com/cloudfoundry/cli/blob/40eb5be48eaac697c3700d5f1e6f654bae471cec/cf/commands/application/logs.go>
is actually maintained by the CLI team under Dies
<https://www.pivotaltracker.com/n/projects/892938>. You can
talk to them. I'll certainly support you by helping explain the need. I'd
think we want a general solution (token in ENV for instance).



On Tue, Apr 12, 2016 at 11:02 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Jim,

If I submitted a CLI PR to change the cf logs command to
substitute /u2028 with /n could the loggregator team get behind that?

Mike

On Tue, Apr 12, 2016 at 10:20 AM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

Mike,

When you get a bit more desperate ;-) here is a nozzle plug
in <https://github.com/jtuchscherer/nozzle-plugin> for the
CLI. It's attaches to the firehose to display everything, but would be easy
to modify to just look at a single app, and sub out the magic token for
newlines.

Jim

On Tue, Apr 12, 2016 at 9:56 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Hi David,

The problem for me is that I'm searching for a solution that
can works for development (though less of a priority cause you can switch
config between dev and cf) and for viewing logs via "cf logs" in addition
to a log aggregator. I had hoped that /u2028 would work for viewing logs
via "cf logs" but it doesn't in bash. I'd need to write a plugin or
something for cf logs and train all my users to use it. Certainly possible
but I'm not that desperate yet. :)

Mike

On Tue, Apr 12, 2016 at 5:58 AM, David Laing <
david(a)davidlaing.com> wrote:

FWIW, the technique is to have your logging solution (eg,
logback, log4j) log a token (eg, \u2028) other than \n to
denote line breaks in your stack traces; and then have your log aggregation
software replace that token with a \n again when processing the log
messages.

If \u2028 doesn't work in your environment; use something
else; eg NEWLINE

On Mon, 11 Apr 2016 at 21:12 Mike Youngstrom <
youngm(a)gmail.com> wrote:

Finally got around to testing this. Preliminary testing
show that "\u2028" doesn't function as a new line
character in bash and causes eclipse console to wig out. I don't think "
\u2028" is a viable long term solution. Hope you make
progress on a metric format available to an app in a container. I too
would like a tracker link to such a feature if there is one.

Thanks,
Mike

On Mon, Mar 14, 2016 at 2:28 PM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Hi Jim,

So, to be clear what we're basically doing is using
unicode newline character to fool loggregator (which is looking for \n)
into thinking that it isn't a new log event right? Does \u2028 work as a
new line character when tailing logs in the CLI? Anyone tried this unicode
new line character in various consoles? IDE, xterm, etc? I'm wondering if
developers will need to have different config for development.

Mike

On Mon, Mar 14, 2016 at 12:17 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

Hi Mike and Alex,

Two things - for Java, we are working toward defining an
enhanced metric format that will support transport of Multi Lines.

The second is this workaround that David Laing suggested
for Logstash. Think you could use it for Splunk?

With the Java Logback library you can do this by adding
"%replace(%xException){'\n','\u2028'}%nopex" to your logging config[1] ,
and then use the following logstash conf.[2]
Replace the unicode newline character \u2028 with \n,
which Kibana will display as a new line.

mutate {

gsub => [ "[@message]", '\u2028', "

"]
^^^ Seems that passing a string with an actual newline
in it is the only way to make gsub work

}

to replace the token with a regular newline again so it
displays "properly" in Kibana.

[1] github.com/dpin...ication.yml#L12
<https://github.com/dpinto-pivotal/cf-SpringBootTrader-config/blob/master/application.yml#L12>

[2] github.com/logs...se.conf#L60-L64
<https://github.com/logsearch/logsearch-for-cloudfoundry/blob/master/src/logsearch-config/src/logstash-filters/snippets/firehose.conf#L60-L64>


On Mon, Mar 14, 2016 at 11:11 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

I'll let the Loggregator team respond formally. But,
in my conversations with the Loggregator team I think we're basically stuck
not sure what the right thing to do is on the client side. How does the
client trigger in loggregator that this is a multi line log message or what
is the right way for loggregator to detect that the client is trying to
send a multi line log message? Any ideas?

Mike

On Mon, Mar 14, 2016 at 10:25 AM, Aliaksandr Prysmakou
<prysmakou(a)gmail.com> wrote:

Hi guys,
Are there any updates about "Multiline Log Entry"
issue? How correctly deal with stacktraces?
Links to the tracker to read?
----
Alex Prysmakou / Altoros
Tel: (617) 841-2121 ext. 5161 | Toll free: 855-ALTOROS
Skype: aliaksandr.prysmakou
www.altoros.com | blog.altoros.com |
twitter.com/altoros


--
Jim Campbell | Product Manager | Cloud Foundry |
Pivotal.io | 303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io
| 303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.