Brokered route services only receiving traffic for routes mapped to started apps


Guillaume Berche
 

Hi Shannon and the routing team,

Testing the route service support in v230, I observe that brokered route
services are only receiving traffic for routes mapped to started apps. In
other words, if a route is mapped to an app in the "crashed", "starting" or
"stopped" state, then any fully-brokered route service bound to that route
won't receive traffic sent to that route, instead the gorouter directly
responds with a 404.

I wonder whether this is a product decision (I had missed this from the
design proposal [3]), or rather this is an intermediate implementation
choice, and that it could be considered to forward any traffic received
from a route to the associated bound route service regardless of the status
of mapped apps (I did not yet find a related story in the routing backlog).

I collected at the end of this email a list of use-cases where I believe
route services will benefit from unconditionally receiving traffic from
bound routes.

In addition, the "unconditional routing of traffic to route services" would
also offer a more consistent behavior between "static route services" and
"fully brokered services" to developers interested in consuming route
services. App developers can be guaranteed that fully-brokered services
that receive traffic even if the app is unavailable (CRASHED, or during a
transient Diego cell unavailability...), just like "static route services"
would.

Lastly, the "unconditional routing of traffic to route services" also seems
more consistent with the current CLI UX: the binding of a route service to
a route is independent of app mapping to the same route. The "cf
bind-route-service" command does not require the route to be bound to an
app. The "cf routes" commands does list routes with bound route services
and not mapped to any apps, etc...

Trying to imagine drawbacks/impacts of such "unconditional routing", I
could so far only spot:
a- slightly more traffic handled by route services that don't wish to
account for/modify the default 404 response on unavailable app
b- slightly more traffic for the gorouter for handling requests sent to
routes mapped to unavailable apps: the request would now be proxied to
route services, which will query back the gorouter
c- potentially slightly larger gorouter routing table that need to kept in
memory (route entries for route services but no app endpoints).

I believe these impacts are acceptable. b) could potentially be reduced by
passing the app status to the route service (e.g. via an additional header
"X-CF-Route-Status" with value "404: route does not exist" ).

Implementation wise, I'm not sure how deep/strong is the current assumption
that "an active route is associated to at least one endpoint" in the
different CF components (gorouter, its nats messages and routing-api, diego
route emiter and BBS models) and therefore the effort required to implement
the "unconditional routing to route services" behavior.

Thanks in advance for your thoughts on this,

Guillaume.


Related use-cases:

*1- returning custom response when app is unavailable (crashed, starting,
stopped, or zero available app instances). *

For apps returning HTML, this may be custom HTML response (rather than the
default gorouter 404 response page), or specific HTTP response code such as
"503 service unavailable" to suggest client some retries. For route
services dealing with routes serving APIs (e.g. SOAP), a route service may
return a proper SOAP-formatted fault response.

Multi-site aware route services may choose to redirect users to a route
hosted on a second CF instance through a "307 temporary redirect" status
code.

A caching service may choose to return (potentially stale) cached content
when the mapped app is in the CRASHED state, rather than returning a 404.

*2- Applying side effects upon unavailability of app*

A SOX-compliant lossless logging service (unlike the potentially lossly
loggregator-based logging), may wish to log full details of the requests
sent to the route, including those that never reached the an available app
instance.

A api gateway route services that would maintain measurements of
performance and availability of the exposed APIs that transit through its
bound routes, would need to receive traffic when bound apps are crashed.

The autosleep service [1] that I'm working on would be able to dynamically
start a previously stopped app in order to save ram during inactivity.

[1]
https://docs.google.com/document/d/1tMhIBX3tw7kPEOMCzKhUgmtmr26GVxyXwUTwMO71THI/edit
<https://github.com/Orange-OpenSource/autosleep>
[2] http://docs.cloudfoundry.org/services/route-services.html#architecture
[3]
https://docs.google.com/document/d/1bGOQxiKkmaw6uaRWGd-sXpxL0Y28d3QihcluI15FiIA/edit#heading=h.8djffzes9pnb

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.