Networking and Service Mesh in CF-for-k8s

Gabriel Rosenhouse <grosenhouse@...>

On the SIG CF for Kubernetes call this morning we discussed Networking, Routing and Service Mesh topics in detail.

A recording is available here.  Thanks to the Cloud Foundry foundation staff , Swarna and Bree, for making the recording available.

Here are some rough notes I took from our call.  Feel free to reply with corrections, feedback and questions.

1. What kind of feature parity are we targeting for networking in CF-for-K8s vs CF-for-BOSH?

The 1.0 release will provide basic http(s) ingress to apps and system components, log stream for apps, documented log and metrics for operational visibility, and transparent mTLS at every workload.

Some things will be more expensive for us to re-implement using the new networking tech stack in cf-for-k8s

We don't expect to get those done before a 1.0 release.

Support for Application Security Groups (ASGs) and app-to-app network policies will not block the 1.0 release.  We hope to deliver those soon afterwards. However, in the 1.0 release we will ship with deny-by-default for both app-to-app and app-egress (ASG-managed).  The alternative, where 1.0 has allow-by-default, and a 1.x moves to deny-by-default, would result in a user-facing breakage during thee upgrade.

Because the vast majority of CF Apps need some kind of app-egress in order to function, we will recommend that, while waiting for support for Cloud Controller-defined ASGs, an operator could instead use Kubectl to apply their own Kubernetes NetworkPolicy objects to provide app egress allow rules.  Such NetworkPolicy objects will continue to work even after CF-for-K8s is enhanced to provide full support for ASGs. Similarly for app-to-app network policies.

The caveat is that such operator-specified NetworkPolicies, like any kubernetes object created directly by the operator, will not be exposed to Cloud Foundry app devs.  There is a risk that these allow-rules will be present without users being aware. As with any production system, we recommend that the operator regularly audit all network policies.  

2. Current dependency on Istio.  What's up with that?

We recognize there are legitimate concerns about the governance of Istio, its complexity and it's resource consumption.  We don't expect that Cloud Foundry will have a hard-dependency on it, and aren't attached to it in the long-term.

App Developers won't use the Istio API directly.

Operators may touch a bit of it here or there when installing CF-for-K8s, but it should be narrowly scoped.  In the near-term, we won't be supporting the Istio API beyond those narrow, operator-facing bits. Custom Istio config (or network policy, for that matter) voids your warranty.

We could consider leaning on the Service Mesh Interface (SMI) as an abstraction layer if there's demand for swappability, although we haven't yet looked at it's suitability, and we're not committing to supporting this in the short-term.

Istio does provide value in transparent mTLS everywhere, and that's a big part of why we're using it right now.  Although alternatives like Linkerd may be worth considering instead.

3. Could migration from CF-for-BOSH/Diego to cf-for-k8s be eased by service-mesh-style connectivity between a Diego cluster and a K8s cluster?

Yes.  If we want to zero-downtime-deploy apps to cf-for-k8s, it is reasonable for those apps to be able to connect back to apps deployed on cf-diego.  Also apps on cf-diego being able to connect to apps on cf-for-k8s.

IP connectivity between containers across clusters would be nice, but many vendors & users don't have this.

CF-for-BOSH-diego has Envoy proxies in each app instance, but they aren't managed by Istio in supported releases (we've ceased development in the istio bosh release).

Alternative solution may involve: share internal route data across clusters, update DNS so that diego clients connect to a cluster-local egress-gateway, which can mTLS tunnel to an ingress gateway in the remote cf-for-k8s cluster.  The reverse could happen from client sidecars and skip the egress gateway. This would require quite a bit more technical design to de-risk, but seems viable.

Happy to field questions and corrections over reply email, or ping me in #cf-for-k8s in cloud foundry slack.

Thank you!