(list resend #1)
Hi Mike,
I think your random approach is workable; what you are doing in effect is
taking fewer polling samples off of the firehose stream.
Short of the aggregation answer James pointed out, this has the potential
to mess with a few things, like averages, but it's better than nothing if
you have to rate-control at ingest, and are looking for a low-cost solution.
In the longer-term, we are looking closely at how to make it easier to
aggregate metrics at either end of loggregator to help with the amount of
data, and hope to have more info shortly. Hopefully that will help with
controlling data flow no matter how often a component emits metrics.
Erik
toggle quoted message
Show quoted text
On Sat, Aug 8, 2015 at 10:49 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
Thanks James,
A little more complicated with more moving parts than I was hoping for but
if I don't want to miss anything I probably don't have much of a choice.
I think for now I'm going to go with some kind of random approach. At
least for the dropsonde generated metrics since they are by far the most
frequent/expensive and I think grabbing a random smattering of them will be
good enough for my current uses.
Mike
On Sat, Aug 8, 2015 at 7:02 AM, James Bayer <jbayer(a)pivotal.io> wrote:
warning, thinking out loud here...
your nozzle will tap the firehose, and filter for the metrics you care
about
currently you're publishing theses events to your metrics backend as fast
as they come in across a horizontally scalable tier that doesn't coordinate
and that can be expensive if your backend charges by the transaction
to slow down the stream, you could consider having the work in two phases:
1) aggregation phase
2) publish phase
the aggregation phase could have each instance of the horizontally scale
out tier put the metric in a temporary data store such as redis or other
in-memory data grid with HA like apache geode [1].
the publish phase would have something like a cron / spring batch
capability to occasionally (as often as made sense for your costs) flush
the metrics from the temporary data store to the backend per-transaction
cost backend
[1] http://geode.incubator.apache.org/
On Fri, Aug 7, 2015 at 9:26 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I suppose one relatively simple solution to this problem is I can have
each cluster member randomly decide if it should log each metric. :) If I
pick a number between 1 and 6 I suppose odds are I would log about every
6th message on average or something like that. :)
Another idea, I could have each member pick a random number between 1
and 10 and I would skip that many messages before publishing then pick a
new random number.
I think it is mostly the dropsonde messages that are killing me. A
technique like this probably wouldn't really work for metrics derived from
http events and such.
Anyone have any other ideas?
MIke
On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:
I'm working on adding support for Firehose metrics to our monitoring
solution. The firehose is working great. However, it appears each
component seems to send updates every 10 seconds or so. This might be a
great interval for some use cases but for my monitoring provider it can get
expensive. Any ideas on how I might limit the frequency of metric updates
from the firehose?
The obvious initial solution is to just do that in my nozzle. However,
I plan to cluster my nozzle using a subscriptionId. My understanding is
that when using a subscriptionId events will get balanced between the
subscribers. That would mean one nozzle instance might know when it last
sent a particular metric, but, the other instances wouldn't, without making
the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
--
Thank you,
James Bayer