Re: Update from cf-networking


Shannon Coen
 

Hi Dan,

The scaling issues are primarily related to CF's use of Istio. 

Currently every Envoy sidecar in the platform receives configuration for all internal routes, regardless of whether there are C2C policies in place that enable apps to connect to one another directly via the overlay network. The sidecars memory utilization increases with the config it holds in memory, and this resource is constrained by a quota for the application container. With default configuration for container memory quota, at around 300 total internal routes all sidecars run out of memory and crash. We can be smarter about the configuration each sidecar receives. We are exploring options, including configuring the sidecars for a given app with routing configuration only for destinations for which a C2C security policy has been created. This would shift the scaling limit to 300 policies per app, which seems more than enough.

Looking further forward, as we explore networking investments in K8s as the orchestrator for CFAR (Eirini), we could explore leveraging pods to give the sidecar and apps independent resource limits. 

Best,
 
Shannon Coen
Product Lead, PCF Networking
Pivotal, Inc.


On Thu, Jun 27, 2019 at 8:50 AM Daniel Jones <daniel.jones@...> wrote:
Awesome, thanks!

Are the scaling issues intrinsic to Istio, or is it CF's use of Istio that's causing the scaling problem?

I'm just curious as to whether we can use this information to infer that no-one is using Istio beyond this scale, or perhaps they are, but they're using it differently.

Regards,
Daniel 'Deejay' Jones - CTO
+44 (0)79 8000 9153
EngineerBetter Ltd - More than cloud platform specialists


On Thu, 27 Jun 2019 at 03:27, Shannon Coen <scoen@...> wrote:
Thanks to Dan from Engineer Better for prompting this update. We (the CF-Networking team) haven't reached out in a while. 

Last October we shared that with our integration between and CF and Istio Pilot, we were able to offer weighted routing via a new Envoy-base ingress gateway and Cloud Controller APIs or a CLI plugin, enabling developers to have more control in shifting traffic from one version of their application to another. That thread is here: https://lists.cloudfoundry.org/g/cf-dev/message/8328

Since then we've been working on extending our integrations to the application-to-application data plane. Our target milestones are enabling developers to rely on the platform for client-side load balancing, timeouts, retries, and mTLS between applications over the C2C overlay network. These features will remove additional toil from developers for having to implement these behaviors, increasing productivity, and give platform operators and security teams confidence that intra-application traffic is secured in a consistent way.

We already support routing of traffic from apps to internal routes through the sidecars and have default policies set for load balancing, timeouts and retries. But this only works at a relatively low scale; 300 total internal routes. Scaling past this will likely require enhancing our integration with Istio to uniquely configure the sidecars for each application based on c2c security policies. 

We've successful spiked out having the consumer and provider sidecars negotiate mTLS, but we've set that down while we work on the scaling problem above. On this topic we want some feedback from the community on a rollout strategy. Look for a follow up email coming soon.

All this work has been happening in istio-release (https://github.com/cloudfoundry/istio-release), which you can deploy with BOSH alongside cf-deployment using an ops file. Documentation can be found on the README. Warning: our integration with Istio is not yet ready for production use cases. In addition to scaling concerns, the control plane is not yet HA nor is it sufficiently instrumented for monitoring. 

In parallel with the Networking team's efforts, other CF teams are doing great work toward the same vision:
  • A collaboration between CAPI and CLI teams, responsible for completing the v3 CC API and delivering the v7 CLI, have been working on the APIs and CLI commands to support declarative configuration of routing rules, starting with percentage-based traffic splitting for external routes. 
  • One engineer from the Windows team has been laboring to contribute Windows support to the Envoy Proxy OSS project, which will enable developers of .NET apps to achieve all the same outcomes planned for Linux apps, plus any outcomes delivered with service mesh in the future. Once the sidecars are in place, the operating system is abstracted. 
We'd love help with all of this. If you'd like to contribute please reply here or reach out to us in #networking.

At CF Summit in Philadelphia earlier this year, I gave a presentation at User Day sharing our journey from routing to service mesh in CF, with the ever present goal of delivering business outcomes for platform operators and the development teams they serve. I've attached the slides. I plan to attend CF Summit EU in The Hague in September, and SpringOne Platform in Austin October; reach out if you'd like to meet up. 

Best,

Shannon Coen
Product Lead, PCF Networking
Pivotal, Inc.

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.