The apiserver, listener, shredder, analyzer all produce standard Go
toggle quoted message
Show quoted text
metrics which really don't tell you much as to the "health" of hm9000
other than the processes are up. The metrics that are reported tell you
the health of your system. If the metrics are not be reported or just
look wrong, that is really when you know the "health" needs further
investigation. In most cases, you will need to examine the logs to
determine next steps.
The listener reports ReceivedHeartbeats and SavedHeartbeats. These will
at least tell you that the listener process is receiving hearbeats and
processing them. I have opened a bug on the values reported since they
are not reporting the last known value but an increasing amount.
The analyzer will report all metrics regarding the state of the expected
applications that should be running including instance counts as well as
what is actually running or missing, etc...
On 5/23/16 7:57 AM, Chawki, Amin wrote:
/-What information were you trying to understand from mem_used_bytes?/
We used mem_used_bytes and mem_free_bytes (currently metrics from
bosh) to get an overview over the real overall memory usage of all
apps as an approximation. This helps us to get a better understanding
of the current overcommit factor.
/-As far as the healthy metric from HM9000, it was quite misleading.
It reported healthy as long as the metrics server was running which
wasn't any indication of health. What exactly do you want to know?/
Ah ok, I was not aware of that. Is there any reliable way to verify
whether HM9000 is healthy?
Best Regards and Thanks,
*From: *Michael Fraenkel <michael.fraenkel(a)gmail.com>
*Reply-To: *"Discussions about Cloud Foundry projects and the system
*Date: *Monday 23 May 2016 at 13:17
*To: *"Discussions about Cloud Foundry projects and the system
*Subject: *[cf-dev] Re: DEA Monitoring Capabilities
When 234 was released, we did not realize that Collector was
creating additional metrics. Based on reports, we have added back
any missing metrics that people felt were needed. Let me know if
we still have missing metrics as you move beyond 234.
In 234, while we did not report available_memory_ratio, we do
report remaining_memory. If your DEAs have the same amount of
memory, the ratio can be computed or you can use the current value
What information were you trying to understand from mem_used_bytes?
As far as the healthy metric from HM9000, it was quite misleading.
It reported healthy as long as the metrics server was running
which wasn't any indication of health. What exactly do you want to
On 5/20/16 4:41 AM, Chawki, Amin wrote:
by upgrading to CF v234 (including pre-release v232) we lost
all our monitoring capabilities regarding DEA and HM9000 (we
were still using Collector). By migrating to Firehose only a
fraction of the metrics was available. Very important metrics
for our productive systems like ‘available_memory_ratio’ were
just added in CF v235. In the meantime, we were pretty much
We replaced not existing metrics like ‘DEA…mem_used_bytes’ and
‘HM9000…healthy’, which were available via Collector, with
metrics from Bosh. Is this the way to go or are there any
plans to add them again?