Brokered route services only receiving traffic for routes mapped to started apps
Hi Shannon and the routing team,
Testing the route service support in v230, I observe that brokered route services are only receiving traffic for routes mapped to started apps. In other words, if a route is mapped to an app in the "crashed", "starting" or "stopped" state, then any fully-brokered route service bound to that route won't receive traffic sent to that route, instead the gorouter directly responds with a 404. I wonder whether this is a product decision (I had missed this from the design proposal [3]), or rather this is an intermediate implementation choice, and that it could be considered to forward any traffic received from a route to the associated bound route service regardless of the status of mapped apps (I did not yet find a related story in the routing backlog). I collected at the end of this email a list of use-cases where I believe route services will benefit from unconditionally receiving traffic from bound routes. In addition, the "unconditional routing of traffic to route services" would also offer a more consistent behavior between "static route services" and "fully brokered services" to developers interested in consuming route services. App developers can be guaranteed that fully-brokered services that receive traffic even if the app is unavailable (CRASHED, or during a transient Diego cell unavailability...), just like "static route services" would. Lastly, the "unconditional routing of traffic to route services" also seems more consistent with the current CLI UX: the binding of a route service to a route is independent of app mapping to the same route. The "cf bind-route-service" command does not require the route to be bound to an app. The "cf routes" commands does list routes with bound route services and not mapped to any apps, etc... Trying to imagine drawbacks/impacts of such "unconditional routing", I could so far only spot: a- slightly more traffic handled by route services that don't wish to account for/modify the default 404 response on unavailable app b- slightly more traffic for the gorouter for handling requests sent to routes mapped to unavailable apps: the request would now be proxied to route services, which will query back the gorouter c- potentially slightly larger gorouter routing table that need to kept in memory (route entries for route services but no app endpoints). I believe these impacts are acceptable. b) could potentially be reduced by passing the app status to the route service (e.g. via an additional header "X-CF-Route-Status" with value "404: route does not exist" ). Implementation wise, I'm not sure how deep/strong is the current assumption that "an active route is associated to at least one endpoint" in the different CF components (gorouter, its nats messages and routing-api, diego route emiter and BBS models) and therefore the effort required to implement the "unconditional routing to route services" behavior. Thanks in advance for your thoughts on this, Guillaume. Related use-cases: *1- returning custom response when app is unavailable (crashed, starting, stopped, or zero available app instances). * For apps returning HTML, this may be custom HTML response (rather than the default gorouter 404 response page), or specific HTTP response code such as "503 service unavailable" to suggest client some retries. For route services dealing with routes serving APIs (e.g. SOAP), a route service may return a proper SOAP-formatted fault response. Multi-site aware route services may choose to redirect users to a route hosted on a second CF instance through a "307 temporary redirect" status code. A caching service may choose to return (potentially stale) cached content when the mapped app is in the CRASHED state, rather than returning a 404. *2- Applying side effects upon unavailability of app* A SOX-compliant lossless logging service (unlike the potentially lossly loggregator-based logging), may wish to log full details of the requests sent to the route, including those that never reached the an available app instance. A api gateway route services that would maintain measurements of performance and availability of the exposed APIs that transit through its bound routes, would need to receive traffic when bound apps are crashed. The autosleep service [1] that I'm working on would be able to dynamically start a previously stopped app in order to save ram during inactivity. [1] https://docs.google.com/document/d/1tMhIBX3tw7kPEOMCzKhUgmtmr26GVxyXwUTwMO71THI/edit <https://github.com/Orange-OpenSource/autosleep> [2] http://docs.cloudfoundry.org/services/route-services.html#architecture [3] https://docs.google.com/document/d/1bGOQxiKkmaw6uaRWGd-sXpxL0Y28d3QihcluI15FiIA/edit#heading=h.8djffzes9pnb |
|