Richer health-checks for CF apps: request for use cases
Eric Malm <emalm@...>
Dear CF Community,
CF has long had a notion of health-checking app instances as they start up to determine whether they're in a functional state, on top of the process simply having started. On the DEAs, the health-check behavior is coupled to whether the app has routes mapped to it, and for apps targeting the Diego backend, this health-check specification is independent of the routing configuration on the app. On Diego cells, the health check is also run periodically[1] even after the app is started, to verify the health of the instance continually. With that independence, we now would have more flexibility to specify richer health checks for CF app instances. We on the CAPI and Diego teams would like to know what kinds of health checks you would find useful for your apps (either ones serving web traffic, or ones doing background work). The two types of health check currently available are 'port', which checks that a TCP connection can be made to the app instance on the port specified by the PORT env var, and 'none', which despite the name does continually verify that the process invoked in the container is still running. As a starting point, on a recent cf-dev thread[2], we identified that for an HTTP-based health check, it would be useful to specify an endpoint to hit, an acceptable response status code or codes, and a timeout to apply to the request. Sensible defaults could be "/", 200 OK, and 1 second, respectively. In any case, please comment here with your health-check use cases, and we intend to use them as input to a proposal soon. Thanks very much, Eric, CF Runtime Diego PM [1]: https://github.com/cloudfoundry-incubator/diego-design-notes/blob/master/migrating-to-diego.md#health-checks [2]: https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/HT7W7UMHR3ZLHV3Q6VJN5URETQUJBVZW/
|
|