Mike Youngstrom <youngm@...>
I'm working on adding support for Firehose metrics to our monitoring solution. The firehose is working great. However, it appears each component seems to send updates every 10 seconds or so. This might be a great interval for some use cases but for my monitoring provider it can get expensive. Any ideas on how I might limit the frequency of metric updates from the firehose?
The obvious initial solution is to just do that in my nozzle. However, I plan to cluster my nozzle using a subscriptionId. My understanding is that when using a subscriptionId events will get balanced between the subscribers. That would mean one nozzle instance might know when it last sent a particular metric, but, the other instances wouldn't, without making the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
|
|
Mike Youngstrom <youngm@...>
I suppose one relatively simple solution to this problem is I can have each cluster member randomly decide if it should log each metric. :) If I pick a number between 1 and 6 I suppose odds are I would log about every 6th message on average or something like that. :)
Another idea, I could have each member pick a random number between 1 and 10 and I would skip that many messages before publishing then pick a new random number.
I think it is mostly the dropsonde messages that are killing me. A technique like this probably wouldn't really work for metrics derived from http events and such.
Anyone have any other ideas?
MIke
toggle quoted message
Show quoted text
On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com> wrote: I'm working on adding support for Firehose metrics to our monitoring solution. The firehose is working great. However, it appears each component seems to send updates every 10 seconds or so. This might be a great interval for some use cases but for my monitoring provider it can get expensive. Any ideas on how I might limit the frequency of metric updates from the firehose?
The obvious initial solution is to just do that in my nozzle. However, I plan to cluster my nozzle using a subscriptionId. My understanding is that when using a subscriptionId events will get balanced between the subscribers. That would mean one nozzle instance might know when it last sent a particular metric, but, the other instances wouldn't, without making the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
|
|
warning, thinking out loud here... your nozzle will tap the firehose, and filter for the metrics you care about currently you're publishing theses events to your metrics backend as fast as they come in across a horizontally scalable tier that doesn't coordinate and that can be expensive if your backend charges by the transaction to slow down the stream, you could consider having the work in two phases: 1) aggregation phase 2) publish phase the aggregation phase could have each instance of the horizontally scale out tier put the metric in a temporary data store such as redis or other in-memory data grid with HA like apache geode [1]. the publish phase would have something like a cron / spring batch capability to occasionally (as often as made sense for your costs) flush the metrics from the temporary data store to the backend per-transaction cost backend [1] http://geode.incubator.apache.org/
toggle quoted message
Show quoted text
On Fri, Aug 7, 2015 at 9:26 AM, Mike Youngstrom <youngm(a)gmail.com> wrote: I suppose one relatively simple solution to this problem is I can have each cluster member randomly decide if it should log each metric. :) If I pick a number between 1 and 6 I suppose odds are I would log about every 6th message on average or something like that. :)
Another idea, I could have each member pick a random number between 1 and 10 and I would skip that many messages before publishing then pick a new random number.
I think it is mostly the dropsonde messages that are killing me. A technique like this probably wouldn't really work for metrics derived from http events and such.
Anyone have any other ideas?
MIke
On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I'm working on adding support for Firehose metrics to our monitoring solution. The firehose is working great. However, it appears each component seems to send updates every 10 seconds or so. This might be a great interval for some use cases but for my monitoring provider it can get expensive. Any ideas on how I might limit the frequency of metric updates from the firehose?
The obvious initial solution is to just do that in my nozzle. However, I plan to cluster my nozzle using a subscriptionId. My understanding is that when using a subscriptionId events will get balanced between the subscribers. That would mean one nozzle instance might know when it last sent a particular metric, but, the other instances wouldn't, without making the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
-- Thank you,
James Bayer
|
|
Mike Youngstrom <youngm@...>
Thanks James,
A little more complicated with more moving parts than I was hoping for but if I don't want to miss anything I probably don't have much of a choice.
I think for now I'm going to go with some kind of random approach. At least for the dropsonde generated metrics since they are by far the most frequent/expensive and I think grabbing a random smattering of them will be good enough for my current uses.
Mike
toggle quoted message
Show quoted text
On Sat, Aug 8, 2015 at 7:02 AM, James Bayer <jbayer(a)pivotal.io> wrote: warning, thinking out loud here...
your nozzle will tap the firehose, and filter for the metrics you care about
currently you're publishing theses events to your metrics backend as fast as they come in across a horizontally scalable tier that doesn't coordinate and that can be expensive if your backend charges by the transaction
to slow down the stream, you could consider having the work in two phases: 1) aggregation phase 2) publish phase
the aggregation phase could have each instance of the horizontally scale out tier put the metric in a temporary data store such as redis or other in-memory data grid with HA like apache geode [1].
the publish phase would have something like a cron / spring batch capability to occasionally (as often as made sense for your costs) flush the metrics from the temporary data store to the backend per-transaction cost backend
[1] http://geode.incubator.apache.org/
On Fri, Aug 7, 2015 at 9:26 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I suppose one relatively simple solution to this problem is I can have each cluster member randomly decide if it should log each metric. :) If I pick a number between 1 and 6 I suppose odds are I would log about every 6th message on average or something like that. :)
Another idea, I could have each member pick a random number between 1 and 10 and I would skip that many messages before publishing then pick a new random number.
I think it is mostly the dropsonde messages that are killing me. A technique like this probably wouldn't really work for metrics derived from http events and such.
Anyone have any other ideas?
MIke
On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I'm working on adding support for Firehose metrics to our monitoring solution. The firehose is working great. However, it appears each component seems to send updates every 10 seconds or so. This might be a great interval for some use cases but for my monitoring provider it can get expensive. Any ideas on how I might limit the frequency of metric updates from the firehose?
The obvious initial solution is to just do that in my nozzle. However, I plan to cluster my nozzle using a subscriptionId. My understanding is that when using a subscriptionId events will get balanced between the subscribers. That would mean one nozzle instance might know when it last sent a particular metric, but, the other instances wouldn't, without making the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
-- Thank you,
James Bayer
|
|
Erik Jasiak <ejasiak@...>
(list resend #1) Hi Mike,
I think your random approach is workable; what you are doing in effect is taking fewer polling samples off of the firehose stream.
Short of the aggregation answer James pointed out, this has the potential to mess with a few things, like averages, but it's better than nothing if you have to rate-control at ingest, and are looking for a low-cost solution.
In the longer-term, we are looking closely at how to make it easier to aggregate metrics at either end of loggregator to help with the amount of data, and hope to have more info shortly. Hopefully that will help with controlling data flow no matter how often a component emits metrics.
Erik
toggle quoted message
Show quoted text
On Sat, Aug 8, 2015 at 10:49 AM, Mike Youngstrom <youngm(a)gmail.com> wrote: Thanks James,
A little more complicated with more moving parts than I was hoping for but if I don't want to miss anything I probably don't have much of a choice.
I think for now I'm going to go with some kind of random approach. At least for the dropsonde generated metrics since they are by far the most frequent/expensive and I think grabbing a random smattering of them will be good enough for my current uses.
Mike
On Sat, Aug 8, 2015 at 7:02 AM, James Bayer <jbayer(a)pivotal.io> wrote:
warning, thinking out loud here...
your nozzle will tap the firehose, and filter for the metrics you care about
currently you're publishing theses events to your metrics backend as fast as they come in across a horizontally scalable tier that doesn't coordinate and that can be expensive if your backend charges by the transaction
to slow down the stream, you could consider having the work in two phases: 1) aggregation phase 2) publish phase
the aggregation phase could have each instance of the horizontally scale out tier put the metric in a temporary data store such as redis or other in-memory data grid with HA like apache geode [1].
the publish phase would have something like a cron / spring batch capability to occasionally (as often as made sense for your costs) flush the metrics from the temporary data store to the backend per-transaction cost backend
[1] http://geode.incubator.apache.org/
On Fri, Aug 7, 2015 at 9:26 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I suppose one relatively simple solution to this problem is I can have each cluster member randomly decide if it should log each metric. :) If I pick a number between 1 and 6 I suppose odds are I would log about every 6th message on average or something like that. :)
Another idea, I could have each member pick a random number between 1 and 10 and I would skip that many messages before publishing then pick a new random number.
I think it is mostly the dropsonde messages that are killing me. A technique like this probably wouldn't really work for metrics derived from http events and such.
Anyone have any other ideas?
MIke
On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I'm working on adding support for Firehose metrics to our monitoring solution. The firehose is working great. However, it appears each component seems to send updates every 10 seconds or so. This might be a great interval for some use cases but for my monitoring provider it can get expensive. Any ideas on how I might limit the frequency of metric updates from the firehose?
The obvious initial solution is to just do that in my nozzle. However, I plan to cluster my nozzle using a subscriptionId. My understanding is that when using a subscriptionId events will get balanced between the subscribers. That would mean one nozzle instance might know when it last sent a particular metric, but, the other instances wouldn't, without making the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
-- Thank you,
James Bayer
|
|
Mike Youngstrom <youngm@...>
Sounds great. I think the random solution works for me now. I'm glad you are aware of the use case and have tentative plans to improve it in the future. Thanks Erik!
Mike
toggle quoted message
Show quoted text
On Tue, Aug 11, 2015 at 1:10 PM, Erik Jasiak <ejasiak(a)pivotal.io> wrote: (list resend #1) Hi Mike,
I think your random approach is workable; what you are doing in effect is taking fewer polling samples off of the firehose stream.
Short of the aggregation answer James pointed out, this has the potential to mess with a few things, like averages, but it's better than nothing if you have to rate-control at ingest, and are looking for a low-cost solution.
In the longer-term, we are looking closely at how to make it easier to aggregate metrics at either end of loggregator to help with the amount of data, and hope to have more info shortly. Hopefully that will help with controlling data flow no matter how often a component emits metrics.
Erik
On Sat, Aug 8, 2015 at 10:49 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
Thanks James,
A little more complicated with more moving parts than I was hoping for but if I don't want to miss anything I probably don't have much of a choice.
I think for now I'm going to go with some kind of random approach. At least for the dropsonde generated metrics since they are by far the most frequent/expensive and I think grabbing a random smattering of them will be good enough for my current uses.
Mike
On Sat, Aug 8, 2015 at 7:02 AM, James Bayer <jbayer(a)pivotal.io> wrote:
warning, thinking out loud here...
your nozzle will tap the firehose, and filter for the metrics you care about
currently you're publishing theses events to your metrics backend as fast as they come in across a horizontally scalable tier that doesn't coordinate and that can be expensive if your backend charges by the transaction
to slow down the stream, you could consider having the work in two phases: 1) aggregation phase 2) publish phase
the aggregation phase could have each instance of the horizontally scale out tier put the metric in a temporary data store such as redis or other in-memory data grid with HA like apache geode [1].
the publish phase would have something like a cron / spring batch capability to occasionally (as often as made sense for your costs) flush the metrics from the temporary data store to the backend per-transaction cost backend
[1] http://geode.incubator.apache.org/
On Fri, Aug 7, 2015 at 9:26 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I suppose one relatively simple solution to this problem is I can have each cluster member randomly decide if it should log each metric. :) If I pick a number between 1 and 6 I suppose odds are I would log about every 6th message on average or something like that. :)
Another idea, I could have each member pick a random number between 1 and 10 and I would skip that many messages before publishing then pick a new random number.
I think it is mostly the dropsonde messages that are killing me. A technique like this probably wouldn't really work for metrics derived from http events and such.
Anyone have any other ideas?
MIke
On Wed, Aug 5, 2015 at 12:06 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:
I'm working on adding support for Firehose metrics to our monitoring solution. The firehose is working great. However, it appears each component seems to send updates every 10 seconds or so. This might be a great interval for some use cases but for my monitoring provider it can get expensive. Any ideas on how I might limit the frequency of metric updates from the firehose?
The obvious initial solution is to just do that in my nozzle. However, I plan to cluster my nozzle using a subscriptionId. My understanding is that when using a subscriptionId events will get balanced between the subscribers. That would mean one nozzle instance might know when it last sent a particular metric, but, the other instances wouldn't, without making the solution more complex than I'd like it to be.
Any thoughts on how I might approach this problem?
Mike
-- Thank you,
James Bayer
|
|