Date
1 - 4 of 4
HM9000 metrics
Pablo Alonso Rodriguez <palonsoro@...>
Good morning.
Recently, I have been revising metrics emitted by CF components. In order to understand HM9000 metrics, I have been reading the metrics documentation (at https://github.com/cloudfoundry/hm9000#metricsserver) I post this message because I have two questions. First question: Not all the metrics retrieved via Ops Metrics are documented there. Is there any additional documentation? If not, could you please explain my what do the following metrics mean? - StartEvacuating, StartCrashed, StartMissing - StopDuplicate, StopEvacuationComplete, StopExtra I have some guesses about some of them, but I am not completely sure about them. Second question: I do not fully understand the difference between the concepts of "instances" and "indices" at metrics like "NumberOfCrashedIndices" and "NumberOfCrashedInstances". For example, I have one crashed app in my CF instance, and "NumberOfCrashedIndices" reports '1' and "NumberOfCrashedInstances" reports '3'. If I have a look at `cf app myapp`, I see one single crashed instance (this was expected). If I have a look at hm9000 dump, I see the following about my crashed app (UUIDs have been replaced by false ones): Guid: 7ef08c44-102d-11e5-9c0d-0fb30c2610f7 | Version: 8e16b09a-102d-11e5-b6ce-27f9445313f8 Desired: [1] instances, (STARTED, STAGED) Heartbeats: [0 CRASHED] a42a7236102d11e5813abfab583ad850 on 1-abc [0 CRASHED] b35b9f1e102d11e5ad29cfc4c2c4e3ea on 2-ac3 [0 CRASHED] bbd37658102d11e5ba8e2b98d1fd1793 on 4-a67 CrashCounts: [0]:7499 Pending Starts: [0] priority:1.00 send:2m34.628437793s So, what does all this mean? I do not understand why do I get 3 heartbeats while I only was trying to start a single instance. Thank you in advance |
|
Dieu Cao <dcao@...>
Hi Pablo,
toggle quoted message
Show quoted text
Ops Metrics is a PCF product and questions about that should be directed to Pivotal customer support. Regarding your second question, about the difference between crashed indices and crashed indexes. The NumberOfCrashedInstances metric is usually about 4 times the NumberOfCrashedIndices metric. First, NumberOfCrashedInstances is the total number of crashed containers that remain on the DEAs, while NumberOfCrashedIndices is the number of app-index pairs which have only crashed instances. If an app has a droplet that crashes on startup, HM9000 will eventually settle on restarting an instance at each of its indices every 16 minutes. When the instance crashes, the DEA will keep its container carcass around for an hour (to allow the space developers to inspect its files via the files API if they have the instance guid). So on average, there will be 60/16 = 3.75 crashed instances in the system per crashed index. That should account for most of the indices and instances that are crashed in the system. Hope that helps. -Dieu CF Runtime PM On Thu, Jun 11, 2015 at 4:48 AM, Pablo Alonso Rodriguez <palonsoro(a)gmail.com
wrote: Good morning. |
|
Pablo Alonso Rodriguez <palonsoro@...>
Ok. Just a question: When you say "the DEA will keep its container carcass
toggle quoted message
Show quoted text
around for an hour", you mean that the DEA does not remove the container files. However, if Warden grace time is configured at 300 seconds (5 minutes), the container is actually destroyed after that time (although its files remain). Is this right? Thank you very much. 2015-06-11 17:27 GMT+02:00 Dieu Cao <dcao(a)pivotal.io>: Hi Pablo, |
|
CF Runtime
I believe the full crashed Warden container is kept around for an hour.
toggle quoted message
Show quoted text
The DEA keeps the Warden handle to the container. The Warden grace time only applies after all handles have been released. Joseph Palermo CF Runtime Team On Thu, Jun 11, 2015 at 8:55 AM, Pablo Alonso Rodriguez <palonsoro(a)gmail.com
wrote: Ok. Just a question: When you say "the DEA will keep its container carcass |
|