Re: HM9000 metrics


Dieu Cao <dcao@...>
 

Hi Pablo,

Ops Metrics is a PCF product and questions about that should be directed to
Pivotal customer support.

Regarding your second question, about the difference between crashed
indices and crashed indexes.

The NumberOfCrashedInstances metric is usually about 4 times the
NumberOfCrashedIndices metric. First, NumberOfCrashedInstances is the total
number of crashed containers that remain on the DEAs, while
NumberOfCrashedIndices is the number of app-index pairs which have only
crashed instances.

If an app has a droplet that crashes on startup, HM9000 will eventually
settle on restarting an instance at each of its indices every 16 minutes.
When the instance crashes, the DEA will keep its container carcass around
for an hour (to allow the space developers to inspect its files via the
files API if they have the instance guid). So on average, there will be
60/16 = 3.75 crashed instances in the system per crashed index. That should
account for most of the indices and instances that are crashed in the
system.

Hope that helps.

-Dieu
CF Runtime PM

On Thu, Jun 11, 2015 at 4:48 AM, Pablo Alonso Rodriguez <palonsoro(a)gmail.com
wrote:
Good morning.

Recently, I have been revising metrics emitted by CF components. In order
to understand HM9000 metrics, I have been reading the metrics documentation
(at https://github.com/cloudfoundry/hm9000#metricsserver)

I post this message because I have two questions.

First question:

Not all the metrics retrieved via Ops Metrics are documented there. Is
there any additional documentation? If not, could you please explain my
what do the following metrics mean?

- StartEvacuating, StartCrashed, StartMissing
- StopDuplicate, StopEvacuationComplete, StopExtra

I have some guesses about some of them, but I am not completely sure about
them.

Second question:

I do not fully understand the difference between the concepts of
"instances" and "indices" at metrics like "NumberOfCrashedIndices" and
"NumberOfCrashedInstances".

For example, I have one crashed app in my CF instance, and
"NumberOfCrashedIndices" reports '1' and "NumberOfCrashedInstances" reports
'3'. If I have a look at `cf app myapp`, I see one single crashed instance
(this was expected). If I have a look at hm9000 dump, I see the following
about my crashed app (UUIDs have been replaced by false ones):

Guid: 7ef08c44-102d-11e5-9c0d-0fb30c2610f7 | Version:
8e16b09a-102d-11e5-b6ce-27f9445313f8
Desired: [1] instances, (STARTED, STAGED)
Heartbeats:
[0 CRASHED] a42a7236102d11e5813abfab583ad850 on 1-abc
[0 CRASHED] b35b9f1e102d11e5ad29cfc4c2c4e3ea on 2-ac3
[0 CRASHED] bbd37658102d11e5ba8e2b98d1fd1793 on 4-a67
CrashCounts: [0]:7499
Pending Starts:
[0] priority:1.00 send:2m34.628437793s

So, what does all this mean? I do not understand why do I get 3 heartbeats
while
I only was trying to start a single instance.

Thank you in advance




_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.