HM9000 metrics
Pablo Alonso Rodriguez <palonsoro@...>
Good morning.
Recently, I have been revising metrics emitted by CF components. In order
to understand HM9000 metrics, I have been reading the metrics documentation
(at https://github.com/cloudfoundry/hm9000#metricsserver)
I post this message because I have two questions.
First question:
Not all the metrics retrieved via Ops Metrics are documented there. Is
there any additional documentation? If not, could you please explain my
what do the following metrics mean?
- StartEvacuating, StartCrashed, StartMissing
- StopDuplicate, StopEvacuationComplete, StopExtra
I have some guesses about some of them, but I am not completely sure about
them.
Second question:
I do not fully understand the difference between the concepts of
"instances" and "indices" at metrics like "NumberOfCrashedIndices" and
"NumberOfCrashedInstances".
For example, I have one crashed app in my CF instance, and
"NumberOfCrashedIndices" reports '1' and "NumberOfCrashedInstances" reports
'3'. If I have a look at `cf app myapp`, I see one single crashed instance
(this was expected). If I have a look at hm9000 dump, I see the following
about my crashed app (UUIDs have been replaced by false ones):
Guid: 7ef08c44-102d-11e5-9c0d-0fb30c2610f7 | Version:
8e16b09a-102d-11e5-b6ce-27f9445313f8
Desired: [1] instances, (STARTED, STAGED)
Heartbeats:
[0 CRASHED] a42a7236102d11e5813abfab583ad850 on 1-abc
[0 CRASHED] b35b9f1e102d11e5ad29cfc4c2c4e3ea on 2-ac3
[0 CRASHED] bbd37658102d11e5ba8e2b98d1fd1793 on 4-a67
CrashCounts: [0]:7499
Pending Starts:
[0] priority:1.00 send:2m34.628437793s
So, what does all this mean? I do not understand why do I get 3 heartbeats
while
I only was trying to start a single instance.
Thank you in advance
Recently, I have been revising metrics emitted by CF components. In order
to understand HM9000 metrics, I have been reading the metrics documentation
(at https://github.com/cloudfoundry/hm9000#metricsserver)
I post this message because I have two questions.
First question:
Not all the metrics retrieved via Ops Metrics are documented there. Is
there any additional documentation? If not, could you please explain my
what do the following metrics mean?
- StartEvacuating, StartCrashed, StartMissing
- StopDuplicate, StopEvacuationComplete, StopExtra
I have some guesses about some of them, but I am not completely sure about
them.
Second question:
I do not fully understand the difference between the concepts of
"instances" and "indices" at metrics like "NumberOfCrashedIndices" and
"NumberOfCrashedInstances".
For example, I have one crashed app in my CF instance, and
"NumberOfCrashedIndices" reports '1' and "NumberOfCrashedInstances" reports
'3'. If I have a look at `cf app myapp`, I see one single crashed instance
(this was expected). If I have a look at hm9000 dump, I see the following
about my crashed app (UUIDs have been replaced by false ones):
Guid: 7ef08c44-102d-11e5-9c0d-0fb30c2610f7 | Version:
8e16b09a-102d-11e5-b6ce-27f9445313f8
Desired: [1] instances, (STARTED, STAGED)
Heartbeats:
[0 CRASHED] a42a7236102d11e5813abfab583ad850 on 1-abc
[0 CRASHED] b35b9f1e102d11e5ad29cfc4c2c4e3ea on 2-ac3
[0 CRASHED] bbd37658102d11e5ba8e2b98d1fd1793 on 4-a67
CrashCounts: [0]:7499
Pending Starts:
[0] priority:1.00 send:2m34.628437793s
So, what does all this mean? I do not understand why do I get 3 heartbeats
while
I only was trying to start a single instance.
Thank you in advance