Re: Strategies for limiting metric updates with a clustered nozzle


Mike Youngstrom <youngm@...>
 

Thanks James,

A little more complicated with more moving parts than I was hoping for but
if I don't want to miss anything I probably don't have much of a choice.

I think for now I'm going to go with some kind of random approach. At
least for the dropsonde generated metrics since they are by far the most
frequent/expensive and I think grabbing a random smattering of them will be
good enough for my current uses.

Mike

On Sat, Aug 8, 2015 at 7:02 AM, James Bayer <jbayer(a)pivotal.io> wrote:

warning, thinking out loud here...

your nozzle will tap the firehose, and filter for the metrics you care
about

currently you're publishing theses events to your metrics backend as fast
as they come in across a horizontally scalable tier that doesn't coordinate
and that can be expensive if your backend charges by the transaction

to slow down the stream, you could consider having the work in two phases:
1) aggregation phase
2) publish phase

the aggregation phase could have each instance of the horizontally scale
out tier put the metric in a temporary data store such as redis or other
in-memory data grid with HA like apache geode [1].

the publish phase would have something like a cron / spring batch
capability to occasionally (as often as made sense for your costs) flush
the metrics from the temporary data store to the backend per-transaction
cost backend

[1] http://geode.incubator.apache.org/

On Fri, Aug 7, 2015 at 9:26 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

I suppose one relatively simple solution to this problem is I can have
each cluster member randomly decide if it should log each metric. :) If I
pick a number between 1 and 6 I suppose odds are I would log about every
6th message on average or something like that. :)

Another idea, I could have each member pick a random number between 1 and
10 and I would skip that many messages before publishing then pick a new
random number.

I think it is mostly the dropsonde messages that are killing me. A
technique like this probably wouldn't really work for metrics derived from
http events and such.

Anyone have any other ideas?

MIke

On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on adding support for Firehose metrics to our monitoring
solution. The firehose is working great. However, it appears each
component seems to send updates every 10 seconds or so. This might be a
great interval for some use cases but for my monitoring provider it can get
expensive. Any ideas on how I might limit the frequency of metric updates
from the firehose?

The obvious initial solution is to just do that in my nozzle. However,
I plan to cluster my nozzle using a subscriptionId. My understanding is
that when using a subscriptionId events will get balanced between the
subscribers. That would mean one nozzle instance might know when it last
sent a particular metric, but, the other instances wouldn't, without making
the solution more complex than I'd like it to be.

Any thoughts on how I might approach this problem?

Mike

--
Thank you,

James Bayer

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.