Richer health-checks for CF apps: request for use cases


Eric Malm <emalm@...>
 

Dear CF Community,

CF has long had a notion of health-checking app instances as they start up
to determine whether they're in a functional state, on top of the process
simply having started. On the DEAs, the health-check behavior is coupled to
whether the app has routes mapped to it, and for apps targeting the Diego
backend, this health-check specification is independent of the routing
configuration on the app. On Diego cells, the health check is also run
periodically[1] even after the app is started, to verify the health of the
instance continually.

With that independence, we now would have more flexibility to specify
richer health checks for CF app instances. We on the CAPI and Diego teams
would like to know what kinds of health checks you would find useful for
your apps (either ones serving web traffic, or ones doing background work).
The two types of health check currently available are 'port', which checks
that a TCP connection can be made to the app instance on the port specified
by the PORT env var, and 'none', which despite the name does continually
verify that the process invoked in the container is still running.

As a starting point, on a recent cf-dev thread[2], we identified that for
an HTTP-based health check, it would be useful to specify an endpoint to
hit, an acceptable response status code or codes, and a timeout to apply to
the request. Sensible defaults could be "/", 200 OK, and 1 second,
respectively.

In any case, please comment here with your health-check use cases, and we
intend to use them as input to a proposal soon.

Thanks very much,
Eric, CF Runtime Diego PM

[1]:
https://github.com/cloudfoundry-incubator/diego-design-notes/blob/master/migrating-to-diego.md#health-checks
[2]:
https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/HT7W7UMHR3ZLHV3Q6VJN5URETQUJBVZW/

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.