HM9000 metrics


Pablo Alonso Rodriguez <palonsoro@...>
 

Good morning.

Recently, I have been revising metrics emitted by CF components. In order
to understand HM9000 metrics, I have been reading the metrics documentation
(at https://github.com/cloudfoundry/hm9000#metricsserver)

I post this message because I have two questions.

First question:

Not all the metrics retrieved via Ops Metrics are documented there. Is
there any additional documentation? If not, could you please explain my
what do the following metrics mean?

- StartEvacuating, StartCrashed, StartMissing
- StopDuplicate, StopEvacuationComplete, StopExtra

I have some guesses about some of them, but I am not completely sure about
them.

Second question:

I do not fully understand the difference between the concepts of
"instances" and "indices" at metrics like "NumberOfCrashedIndices" and
"NumberOfCrashedInstances".

For example, I have one crashed app in my CF instance, and
"NumberOfCrashedIndices" reports '1' and "NumberOfCrashedInstances" reports
'3'. If I have a look at `cf app myapp`, I see one single crashed instance
(this was expected). If I have a look at hm9000 dump, I see the following
about my crashed app (UUIDs have been replaced by false ones):

Guid: 7ef08c44-102d-11e5-9c0d-0fb30c2610f7 | Version:
8e16b09a-102d-11e5-b6ce-27f9445313f8
Desired: [1] instances, (STARTED, STAGED)
Heartbeats:
[0 CRASHED] a42a7236102d11e5813abfab583ad850 on 1-abc
[0 CRASHED] b35b9f1e102d11e5ad29cfc4c2c4e3ea on 2-ac3
[0 CRASHED] bbd37658102d11e5ba8e2b98d1fd1793 on 4-a67
CrashCounts: [0]:7499
Pending Starts:
[0] priority:1.00 send:2m34.628437793s

So, what does all this mean? I do not understand why do I get 3 heartbeats
while
I only was trying to start a single instance.

Thank you in advance

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.