Re: Issue with crashing Windows apps on Diego
Eric Malm <emalm@...>
Hi, Aaron and Matt,
toggle quoted message
Show quoted text
Thanks for the thoughtful discussion of the Windows health-check issue. I too think for consistency that if the CF end user has specified 'port' as the type of health-check on their app, then the platform should be checking only TCP connectivity to the app on that port, and not any layer-7 functionality beyond that. Some background on the HTTP vs TCP behavior in the health-check: originally, the health-check binary used for the buildpack and docker app lifecycles made only TCP connections to the requested port. When Lattice made it possible to submit DesiredLRPs directly to the Diego API, we got feedback from its users that they wanted an option to specify an HTTP-based health-check as well. Consequently, we extended that health-check binary to take an optional endpoint flag, and in its presence the binary would make a GET request to the specified endpoint and check for a response with a 200 OK status code within the specified timeout (default 1s). For buildpack and docker CF apps, though, none of that HTTP functionality has been exposed through CC, and only the basic TCP connectivity check is available. Matt, the native NetCheckAction from the Diego Dev Notes proposal you mention is effectively just encoding the current behavior of that TCP-or-HTTP health-check binary as an action that the rep could perform itself, rather that by invoking that binary in-container. The Diego team had conceived of it primarily as a performance optimization, particularly when starting a lot of instances on a cell simultaneously, but investigation revealed it to be of secondary benefit at best. The Diego team might implement it at some point, but for now we'd prefer not to expand the surface area of the Diego BBS API to include it. I've been meaning to update and close out that Dev Notes issue, and will do so shortly. In any case, the options on that proposed NetCheckAction are just the ones already available on the health-check binary, and, native action or not, additional work would still be required to expose them through CC to the CF end-user. Moreover, I don't think they're sufficient to address all the concerns that Aaron raises in his observations about the Windows app lifecycle's current HTTP-based check. Aaron, you mentioned timeout and expected status code as important parameters to specify on an HTTP health-check; are there others? I would think endpoint could be just as useful: perhaps your app has a /health or /ping endpoint specifically designed to return a fast response about the app itself, separate from backing services and/or authentication checks, or perhaps it simply doesn't handle requests to /. Thanks, Eric On Tue, Feb 2, 2016 at 1:50 PM, aaron_huber <aaron.m.huber(a)intel.com> wrote:
My concern is that the HTTP check (mislabeled as "port") would still be the |
|