Date
1 - 7 of 7
DEA Monitoring Capabilities
Chawki, Amin <amin.chawki@...>
Hi,
by upgrading to CF v234 (including pre-release v232) we lost all our monitoring capabilities regarding DEA and HM9000 (we were still using Collector). By migrating to Firehose only a fraction of the metrics was available. Very important metrics for our productive systems like ‘available_memory_ratio’ were just added in CF v235. In the meantime, we were pretty much “flying blind”. We replaced not existing metrics like ‘DEA…mem_used_bytes’ and ‘HM9000…healthy’, which were available via Collector, with metrics from Bosh. Is this the way to go or are there any plans to add them again? Best Regards, Amin |
|
Marco Voelz
Including Jim Campbell to make sure this reaches him.
toggle quoted message
Show quoted text
On 20/05/16 13:41, "Chawki, Amin" <amin.chawki(a)sap.com<mailto:amin.chawki(a)sap.com>> wrote:
Hi, by upgrading to CF v234 (including pre-release v232) we lost all our monitoring capabilities regarding DEA and HM9000 (we were still using Collector). By migrating to Firehose only a fraction of the metrics was available. Very important metrics for our productive systems like ‘available_memory_ratio’ were just added in CF v235. In the meantime, we were pretty much “flying blind”. We replaced not existing metrics like ‘DEA…mem_used_bytes’ and ‘HM9000…healthy’, which were available via Collector, with metrics from Bosh. Is this the way to go or are there any plans to add them again? Best Regards, Amin |
|
Jim CF Campbell
Yep, I've had a back thread with Runtime OG who now has the DEA metrics and
toggle quoted message
Show quoted text
who I thought had implemented all previous /varz (aka Collector) metrics into the firehose. No answer from Mr Fraenkel yet. On Fri, May 20, 2016 at 11:15 AM, Voelz, Marco <marco.voelz(a)sap.com> wrote:
Including Jim Campbell to make sure this reaches him. --
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io | 303.618.0963 |
|
Jim CF Campbell
Looks like the Runtime team took some out in v234, added them back in in
v235. From the Runtime Slack: On Fri, May 20, 2016 at 11:47 AM, Jim CF Campbell <jcampbell(a)pivotal.io> wrote: Yep, I've had a back thread with Runtime OG who now has the DEA metrics -- Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io | 303.618.0963 |
|
Michael Fraenkel <michael.fraenkel@...>
When 234 was released, we did not realize that Collector was creating
toggle quoted message
Show quoted text
additional metrics. Based on reports, we have added back any missing metrics that people felt were needed. Let me know if we still have missing metrics as you move beyond 234. In 234, while we did not report available_memory_ratio, we do report remaining_memory. If your DEAs have the same amount of memory, the ratio can be computed or you can use the current value directly. What information were you trying to understand from mem_used_bytes? As far as the healthy metric from HM9000, it was quite misleading. It reported healthy as long as the metrics server was running which wasn't any indication of health. What exactly do you want to know? - Michael On 5/20/16 4:41 AM, Chawki, Amin wrote:
|
|
Chawki, Amin <amin.chawki@...>
-What information were you trying to understand from mem_used_bytes?
toggle quoted message
Show quoted text
We used mem_used_bytes and mem_free_bytes (currently metrics from bosh) to get an overview over the real overall memory usage of all apps as an approximation. This helps us to get a better understanding of the current overcommit factor. -As far as the healthy metric from HM9000, it was quite misleading. It reported healthy as long as the metrics server was running which wasn't any indication of health. What exactly do you want to know? Ah ok, I was not aware of that. Is there any reliable way to verify whether HM9000 is healthy? Best Regards and Thanks, Amin From: Michael Fraenkel <michael.fraenkel(a)gmail.com> Reply-To: "Discussions about Cloud Foundry projects and the system overall." <cf-dev(a)lists.cloudfoundry.org> Date: Monday 23 May 2016 at 13:17 To: "Discussions about Cloud Foundry projects and the system overall." <cf-dev(a)lists.cloudfoundry.org> Subject: [cf-dev] Re: DEA Monitoring Capabilities When 234 was released, we did not realize that Collector was creating additional metrics. Based on reports, we have added back any missing metrics that people felt were needed. Let me know if we still have missing metrics as you move beyond 234. In 234, while we did not report available_memory_ratio, we do report remaining_memory. If your DEAs have the same amount of memory, the ratio can be computed or you can use the current value directly. What information were you trying to understand from mem_used_bytes? As far as the healthy metric from HM9000, it was quite misleading. It reported healthy as long as the metrics server was running which wasn't any indication of health. What exactly do you want to know? - Michael On 5/20/16 4:41 AM, Chawki, Amin wrote:
Hi, by upgrading to CF v234 (including pre-release v232) we lost all our monitoring capabilities regarding DEA and HM9000 (we were still using Collector). By migrating to Firehose only a fraction of the metrics was available. Very important metrics for our productive systems like ‘available_memory_ratio’ were just added in CF v235. In the meantime, we were pretty much “flying blind”. We replaced not existing metrics like ‘DEA…mem_used_bytes’ and ‘HM9000…healthy’, which were available via Collector, with metrics from Bosh. Is this the way to go or are there any plans to add them again? Best Regards, Amin |
|
Michael Fraenkel <michael.fraenkel@...>
The apiserver, listener, shredder, analyzer all produce standard Go
toggle quoted message
Show quoted text
metrics which really don't tell you much as to the "health" of hm9000 other than the processes are up. The metrics that are reported tell you the health of your system. If the metrics are not be reported or just look wrong, that is really when you know the "health" needs further investigation. In most cases, you will need to examine the logs to determine next steps. The listener reports ReceivedHeartbeats and SavedHeartbeats. These will at least tell you that the listener process is receiving hearbeats and processing them. I have opened a bug on the values reported since they are not reporting the last known value but an increasing amount. The analyzer will report all metrics regarding the state of the expected applications that should be running including instance counts as well as what is actually running or missing, etc... - Michael On 5/23/16 7:57 AM, Chawki, Amin wrote:
|
|