Re: Generic data points for dropsonde


Benjamin Black
 

johannes,

the problem of upstream schema changes causing downstream change or
breakage is the current situation: every addition of a metric type implies
a change to the dropsonde-protocol requiring everything downstream to be
updated.

the schema concerns are similar. currently there is no schema whatsoever
beyond the very fine grained "this is a name and this is a value". this
means every implementation of redis info export, for example, can, and
almost certainly will, be different. this results in every downstream
consumer having to know every possible variant or to only support specific
variants, both exactly the problem you are looking to avoid.

i share the core concern regarding ensuring points are "sufficiently" self
describing. however, there is no clear line delineating what is sufficient.
the current proposal pushes almost everything out to schema. we could
imagine a change to the attributes that includes what an attribute is
(gauge, counter, etc), what the units are for the attribute, and so on.

it is critical that we balance the complexity of the points against
complexity of the consumers as there is no free lunch here. which specific
functionality would you want to see in the generic points to achieve the
balance you prefer?


b



On Wed, Sep 2, 2015 at 2:07 PM, Johannes Tuchscherer <
jtuchscherer(a)pivotal.io> wrote:

The current way of sending metrics as either Values or Counters through
the pipeline makes the development of a downstream consumer (=nozzle)
pretty easy. If you look at the datadog nozzle[0], it just takes all
ValueMetrics and Counters and sends them off to datadog. The nozzle does
not have to know anything about these metrics (e.g. their origin, name, or
layout).

Adding a new way to send metrics as a nested object would make the
downstream implementation certainly more complicated. In that case, the
nozzle developer has to know what metrics are included inside the generic
point (basically the schema of the metric) and treat each point
accordingly. For example, if I were to write a nozzle that emits metrics to
Graphite with a StatsD client (like it is done here[1]), I need to know if
my int64 value is a Gauge or a Counter. Also, my consumer is under constant
risk of breaking when the upstream schema changes.

We are already facing this problem with the container metrics. But at
least the container metrics are in a defined format that is well documented
and not likely to change.

I agree with you, though, the the dropsonde protocol could use a mechanism
for easier extension. Having a GenericPoint and/or GenericEvent seems like
a good idea that I whole-heartedly support. I would just like to stay away
from nested metrics. I think the cost of adding more logic into the
downstream consumer (and making it harder to maintain) is not worth the
benefit of a more concise metric transport.


[0] https://github.com/cloudfoundry-incubator/datadog-firehose-nozzle
[1] https://github.com/CloudCredo/graphite-nozzle

On Tue, Sep 1, 2015 at 5:52 PM, Benjamin Black <bblack(a)pivotal.io> wrote:

great questions, dwayne.

1) the partition key is intended to be used in a similar manner to
partitioners in distributed systems like cassandra or kafka. the specific
behavior i would like to make part of the contract is two-fold: that all
data with the same key is routed to the same partition and that all data in
a partition is FIFO (meaning no ordering guarantees beyond arrival time).

this could help with the multi-line log problem by ensuring a single
consumer will receive all lines for a given log entry in order, allowing
simple reassembly. however, the lines might be interleaved with other lines
with the same key or even other keys that happen to map to the same
partition, so the consumer does require a bit of intelligence. this was
actually one of the driving scenarios for adding the key.

2) i expect typical points to be in the hundreds of bytes to a few KB. if
we find ourselves regularly needing much larger points, especially near
that 64KB limit, i'd look to the JSON representation as the hierarchical
structure is more efficiently managed there.


b




On Tue, Sep 1, 2015 at 4:42 PM, <dschultz(a)pivotal.io> wrote:

Hi Ben,

I was wondering if you could give a concrete use case for the partition
key functionality.

In particular I am interested in how we solve multi line log entries. I
think it would be better to solve it by keeping all the data (the multiple
lines) together throughout the logging/metrics pipeline, but could see how
something like a partition key might help keep the data together as well.

Second question: how large do you see these point messages getting
(average and max)? There are still several stages of the logging/metrics
pipeline that use UDP with a standard 64K size limit.

Thanks,
Dwayne

On Aug 28, 2015, at 4:54 PM, Benjamin Black <bblack(a)pivotal.io> wrote:

All,

The existing dropsonde protocol uses a different message type for each
event type. HttpStart, HttpStop, ContainerMetrics, and so on are all
distinct types in the protocol definition. This requires protocol changes
to introduce any new event type, making such changes very expensive. We've
been working for the past few weeks on an addition to the dropsonde
protocol to support easier future extension to new types of events and to
make it easier for users to define their own events.

The document linked below [1] describes a generic data point message
capable of carrying multi-dimensional, multi-metric points as sets of
name/value pairs. This new message is expected to be added as an additional
entry in the existing dropsonde protocol metric type enum. Things are now
at a point where we'd like to get feedback from the community before moving
forward with implementation.

Please contribute your thoughts on the document in whichever way you are
most comfortable: comments on the document, email here, or email directly
to me. If you comment on the document, please make sure you are logged in
so we can keep track of who is asking for what. Your views are not just
appreciated, but critical to the continued health and success of the Cloud
Foundry community. Thank you!


b

[1]
https://docs.google.com/document/d/1SzvT1BjrBPqUw6zfSYYFfaW9vX_dTZZjn5sl2nxB6Bc/edit?usp=sharing




Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.