Proposal: container networking for applications


Jason Sherron
 

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support direct
container-to-container networking and communication. We aim to provide
value to developers and admins by enabling new capabilities while providing
network access controls, and by providing first-class network-operations
flexibility.

The problems
- The current network implementation in Cloud Foundry restricts developers
and admins from secure, performant network communications directly between
containers. To support new service architectures, customers often need
fast, direct container-to-container communication while maintaining
granular control of network security in CF.
- Physical network configuration is inflexible with one addressing and
routing topology, while customers are demanding support for a variety of
network configurations and virtualization stacks, often driven by security
and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in two
principles: declarative network policy, and modular network topology. Our
goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery mechanisms
to support this container-to-container capability. As containers and
microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in the
document.

https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Onsi Fakhouri <ofakhouri@...>
 

Great work all! Looking forward to the discussion on the doc!

Onsi

On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io> wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support direct
container-to-container networking and communication. We aim to provide
value to developers and admins by enabling new capabilities while providing
network access controls, and by providing first-class network-operations
flexibility.

The problems
- The current network implementation in Cloud Foundry restricts developers
and admins from secure, performant network communications directly between
containers. To support new service architectures, customers often need
fast, direct container-to-container communication while maintaining
granular control of network security in CF.
- Physical network configuration is inflexible with one addressing and
routing topology, while customers are demanding support for a variety of
network configurations and virtualization stacks, often driven by security
and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in two
principles: declarative network policy, and modular network topology. Our
goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery mechanisms
to support this container-to-container capability. As containers and
microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in the
document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Jason Sherron
 

Hi everyone,

Are there any remaining comments or concerns on the container networking
proposal that need to be addressed before we launch the effort in earnest?
We made minor edits to clarify some of the implementation phases but
overall the spirit of the document is unchanged. Last call is Jan 3. Thanks.

Jason

On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support direct
container-to-container networking and communication. We aim to provide
value to developers and admins by enabling new capabilities while providing
network access controls, and by providing first-class network-operations
flexibility.

The problems
- The current network implementation in Cloud Foundry restricts
developers and admins from secure, performant network communications
directly between containers. To support new service architectures,
customers often need fast, direct container-to-container communication
while maintaining granular control of network security in CF.
- Physical network configuration is inflexible with one addressing and
routing topology, while customers are demanding support for a variety of
network configurations and virtualization stacks, often driven by security
and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in two
principles: declarative network policy, and modular network topology. Our
goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery
mechanisms to support this container-to-container capability. As containers
and microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in the
document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Mike Youngstrom <youngm@...>
 

BTW, if you have an application that relies upon the "CF_INSTANCE_ADDR",
"CF_INSTANCE_IP", or "CF_INSTANCE_PORTS" environment variables for direct
communication between apps not in the same space or not in CF at all the
implementation of this proposal will break you.

That was a bit of a surprise to me when I read the proposal so I thought
I'd call it out for those who might not have noticed.

Otherwise it a really nice feature I'm excited for.

Thanks,
Mike

On Tue, Dec 29, 2015 at 10:17 AM, Jason Sherron <jsherron(a)pivotal.io> wrote:

Hi everyone,

Are there any remaining comments or concerns on the container networking
proposal that need to be addressed before we launch the effort in earnest?
We made minor edits to clarify some of the implementation phases but
overall the spirit of the document is unchanged. Last call is Jan 3. Thanks.

Jason



On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support direct
container-to-container networking and communication. We aim to provide
value to developers and admins by enabling new capabilities while providing
network access controls, and by providing first-class network-operations
flexibility.

The problems
- The current network implementation in Cloud Foundry restricts
developers and admins from secure, performant network communications
directly between containers. To support new service architectures,
customers often need fast, direct container-to-container communication
while maintaining granular control of network security in CF.
- Physical network configuration is inflexible with one addressing and
routing topology, while customers are demanding support for a variety of
network configurations and virtualization stacks, often driven by security
and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in
two principles: declarative network policy, and modular network topology.
Our goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery
mechanisms to support this container-to-container capability. As containers
and microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in the
document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Mike Youngstrom <youngm@...>
 

The addition of global config to disable this new functionality for a
deployment has been proposed to maintain backwards compatibility. If
you're interested in such a flag let the team know.

Mike

On Tue, Jan 5, 2016 at 5:08 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

BTW, if you have an application that relies upon the "CF_INSTANCE_ADDR",
"CF_INSTANCE_IP", or "CF_INSTANCE_PORTS" environment variables for direct
communication between apps not in the same space or not in CF at all the
implementation of this proposal will break you.

That was a bit of a surprise to me when I read the proposal so I thought
I'd call it out for those who might not have noticed.

Otherwise it a really nice feature I'm excited for.

Thanks,
Mike

On Tue, Dec 29, 2015 at 10:17 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi everyone,

Are there any remaining comments or concerns on the container networking
proposal that need to be addressed before we launch the effort in earnest?
We made minor edits to clarify some of the implementation phases but
overall the spirit of the document is unchanged. Last call is Jan 3. Thanks.

Jason



On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support direct
container-to-container networking and communication. We aim to provide
value to developers and admins by enabling new capabilities while providing
network access controls, and by providing first-class network-operations
flexibility.

The problems
- The current network implementation in Cloud Foundry restricts
developers and admins from secure, performant network communications
directly between containers. To support new service architectures,
customers often need fast, direct container-to-container communication
while maintaining granular control of network security in CF.
- Physical network configuration is inflexible with one addressing and
routing topology, while customers are demanding support for a variety of
network configurations and virtualization stacks, often driven by security
and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in
two principles: declarative network policy, and modular network topology.
Our goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery
mechanisms to support this container-to-container capability. As containers
and microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in the
document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Ted Young
 

I'd like to raise awareness of the scheduling changes needed to have stable
identity, which I believe are beyond the scope of this proposal.

Currently, Diego has a High Availability scheduler for stateless
applications. This means it makes tradeoffs in order to ensure uptime while
balancing the workload on the cluster. These optimizations in turn make the
scheduler inappropriate for running stable, stateful services. For example,
during evacuation we ensure another copy of your instance is up before
taking down the old instance. This means you will be running N + ???
instances at any given moment, with ??? instances having an identities that
match older instances. So adding container-to-container networking is not
by itself sufficient for running a database on CF.

The flipside of this is that identity for HA/stateless apps is a bit of an
odd fish. How will service discovery present two "instance 8s" that are
running at the same time? I would urge the networking team to understand
how the scheduler works, while at the same time not depend on
implementation details of the current scheduler, as we will continue to
optimize going forwards.

All Cell allocations are optimistic, and it is possible to add other
schedulers that will support services that require stable, consistent
identity. But I believe that is beyond the scope of this proposal.

Cheers,
Ted

On Tue, Jan 5, 2016 at 6:10 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

The addition of global config to disable this new functionality for a
deployment has been proposed to maintain backwards compatibility. If
you're interested in such a flag let the team know.

Mike

On Tue, Jan 5, 2016 at 5:08 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

BTW, if you have an application that relies upon the "CF_INSTANCE_ADDR",
"CF_INSTANCE_IP", or "CF_INSTANCE_PORTS" environment variables for direct
communication between apps not in the same space or not in CF at all the
implementation of this proposal will break you.

That was a bit of a surprise to me when I read the proposal so I thought
I'd call it out for those who might not have noticed.

Otherwise it a really nice feature I'm excited for.

Thanks,
Mike

On Tue, Dec 29, 2015 at 10:17 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi everyone,

Are there any remaining comments or concerns on the container networking
proposal that need to be addressed before we launch the effort in earnest?
We made minor edits to clarify some of the implementation phases but
overall the spirit of the document is unchanged. Last call is Jan 3. Thanks.

Jason



On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support
direct container-to-container networking and communication. We aim to
provide value to developers and admins by enabling new capabilities while
providing network access controls, and by providing first-class
network-operations flexibility.

The problems
- The current network implementation in Cloud Foundry restricts
developers and admins from secure, performant network communications
directly between containers. To support new service architectures,
customers often need fast, direct container-to-container communication
while maintaining granular control of network security in CF.
- Physical network configuration is inflexible with one addressing and
routing topology, while customers are demanding support for a variety of
network configurations and virtualization stacks, often driven by security
and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in
two principles: declarative network policy, and modular network topology.
Our goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery
mechanisms to support this container-to-container capability. As containers
and microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in
the document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Jason Sherron
 

Thanks for the feedback. I'm happy to say that one of the team members
worked directly on the Diego runtime, so some of that knowledge is already
internalized. ;)

I agree completely that container networking is insufficient to deploy some
app architectures on CF, but we know that it is necessary for many. Thus
I'd also assert that it's out of scope for this effort.

On Thu, Jan 7, 2016 at 2:51 PM, Ted Young <tyoung(a)pivotal.io> wrote:

I'd like to raise awareness of the scheduling changes needed to have
stable identity, which I believe are beyond the scope of this proposal.

Currently, Diego has a High Availability scheduler for stateless
applications. This means it makes tradeoffs in order to ensure uptime while
balancing the workload on the cluster. These optimizations in turn make the
scheduler inappropriate for running stable, stateful services. For example,
during evacuation we ensure another copy of your instance is up before
taking down the old instance. This means you will be running N + ???
instances at any given moment, with ??? instances having an identities that
match older instances. So adding container-to-container networking is not
by itself sufficient for running a database on CF.

The flipside of this is that identity for HA/stateless apps is a bit of an
odd fish. How will service discovery present two "instance 8s" that are
running at the same time? I would urge the networking team to understand
how the scheduler works, while at the same time not depend on
implementation details of the current scheduler, as we will continue to
optimize going forwards.

All Cell allocations are optimistic, and it is possible to add other
schedulers that will support services that require stable, consistent
identity. But I believe that is beyond the scope of this proposal.

Cheers,
Ted

On Tue, Jan 5, 2016 at 6:10 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

The addition of global config to disable this new functionality for a
deployment has been proposed to maintain backwards compatibility. If
you're interested in such a flag let the team know.

Mike

On Tue, Jan 5, 2016 at 5:08 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

BTW, if you have an application that relies upon the "CF_INSTANCE_ADDR",
"CF_INSTANCE_IP", or "CF_INSTANCE_PORTS" environment variables for direct
communication between apps not in the same space or not in CF at all the
implementation of this proposal will break you.

That was a bit of a surprise to me when I read the proposal so I thought
I'd call it out for those who might not have noticed.

Otherwise it a really nice feature I'm excited for.

Thanks,
Mike

On Tue, Dec 29, 2015 at 10:17 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi everyone,

Are there any remaining comments or concerns on the container
networking proposal that need to be addressed before we launch the effort
in earnest? We made minor edits to clarify some of the implementation
phases but overall the spirit of the document is unchanged. Last call is
Jan 3. Thanks.

Jason



On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support
direct container-to-container networking and communication. We aim to
provide value to developers and admins by enabling new capabilities while
providing network access controls, and by providing first-class
network-operations flexibility.

The problems
- The current network implementation in Cloud Foundry restricts
developers and admins from secure, performant network communications
directly between containers. To support new service architectures,
customers often need fast, direct container-to-container communication
while maintaining granular control of network security in CF.
- Physical network configuration is inflexible with one addressing
and routing topology, while customers are demanding support for a variety
of network configurations and virtualization stacks, often driven by
security and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in
two principles: declarative network policy, and modular network topology.
Our goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery
mechanisms to support this container-to-container capability. As containers
and microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in
the document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group


Onsi Fakhouri <ofakhouri@...>
 

+1 to what Ted said. Plus a few comments:

The only case in which an identical copy of a given instance index is
expected is during evacuation. In all other cases it is considered an
error for two instances of an index to be up and Diego will immediately
shut one down (in fact, the only way to enter this state is via a network
partition). So, truly, ??? <= 1.

While we may well want to genericize the scheduling layer further I'd
encourage us to first look at ways to iterate on the existing rules and
heuristics to open up the set of appropriate usecases. For example, LRPs
could have a set of policies attached to them. A simple policy could be
around duplication during evacuation. It would be fairly simple for a cell
to inspect an LRP and decide *not* to keep it running while it waits for a
duplicate to come up.

Similarly, we could bring tighter control over which indices of an LRP are
supposed to be up. Rather than simply specify some number of instances of
an LRP to run we could (for example) begin to make statements about the
desired state for individual indices of the LRP. A(n ugly) way to do this
could be to desire an LRP with an array of indices-to-run: [0,1,2] would
run 3 indices. [0,2] would shut 1 down. A cleaner interface could be a
DesiredLRPIndex subobject that refers to a parent DesiredLRP and gives us
even finer control over an individual instance's lifecycle.

The benefits here are that it keeps the scheduler lean and generic and
pushes lifecycle concerns up to the client.

Ted - to be clear - I do see the benefits of a more generic scheduling
layer and I get the costs of turning the existing scheduler into a
frankenstein of knobs. From a small-step perspective, though, I do think
adding a *few *strategic knobs could help us build a case for how necessary
something more expensive might be.

Some concrete use-cases could help to. I *think* the two knobs I'm
describing alleviate the unique-instance concern. I don't know if they're
enough to allow us to sanely manage a database though.

Onsi

On Thu, Jan 7, 2016 at 5:51 PM, Ted Young <tyoung(a)pivotal.io> wrote:

I'd like to raise awareness of the scheduling changes needed to have
stable identity, which I believe are beyond the scope of this proposal.

Currently, Diego has a High Availability scheduler for stateless
applications. This means it makes tradeoffs in order to ensure uptime while
balancing the workload on the cluster. These optimizations in turn make the
scheduler inappropriate for running stable, stateful services. For example,
during evacuation we ensure another copy of your instance is up before
taking down the old instance. This means you will be running N + ???
instances at any given moment, with ??? instances having an identities that
match older instances. So adding container-to-container networking is not
by itself sufficient for running a database on CF.

The flipside of this is that identity for HA/stateless apps is a bit of an
odd fish. How will service discovery present two "instance 8s" that are
running at the same time? I would urge the networking team to understand
how the scheduler works, while at the same time not depend on
implementation details of the current scheduler, as we will continue to
optimize going forwards.

All Cell allocations are optimistic, and it is possible to add other
schedulers that will support services that require stable, consistent
identity. But I believe that is beyond the scope of this proposal.

Cheers,
Ted

On Tue, Jan 5, 2016 at 6:10 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

The addition of global config to disable this new functionality for a
deployment has been proposed to maintain backwards compatibility. If
you're interested in such a flag let the team know.

Mike

On Tue, Jan 5, 2016 at 5:08 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

BTW, if you have an application that relies upon the "CF_INSTANCE_ADDR",
"CF_INSTANCE_IP", or "CF_INSTANCE_PORTS" environment variables for direct
communication between apps not in the same space or not in CF at all the
implementation of this proposal will break you.

That was a bit of a surprise to me when I read the proposal so I thought
I'd call it out for those who might not have noticed.

Otherwise it a really nice feature I'm excited for.

Thanks,
Mike

On Tue, Dec 29, 2015 at 10:17 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi everyone,

Are there any remaining comments or concerns on the container
networking proposal that need to be addressed before we launch the effort
in earnest? We made minor edits to clarify some of the implementation
phases but overall the spirit of the document is unchanged. Last call is
Jan 3. Thanks.

Jason



On Thu, Dec 3, 2015 at 10:02 AM, Jason Sherron <jsherron(a)pivotal.io>
wrote:

Hi, CF-dev community members!

Our cross-company team is happy to present a proposal to support
direct container-to-container networking and communication. We aim to
provide value to developers and admins by enabling new capabilities while
providing network access controls, and by providing first-class
network-operations flexibility.

The problems
- The current network implementation in Cloud Foundry restricts
developers and admins from secure, performant network communications
directly between containers. To support new service architectures,
customers often need fast, direct container-to-container communication
while maintaining granular control of network security in CF.
- Physical network configuration is inflexible with one addressing
and routing topology, while customers are demanding support for a variety
of network configurations and virtualization stacks, often driven by
security and IT standards.

The proposal
We propose an improved container networking infrastructure, rooted in
two principles: declarative network policy, and modular network topology.
Our goal is to allow developers and admins to define container-to-container
network graphs that make sense for their business in a high-level,
build-time manner, and then mapping that logical topology onto supported
network stacks, enabled by the modular network capabilities in libnetwork
from the Docker project.

Help wanted
We specifically request feedback on potential service discovery
mechanisms to support this container-to-container capability. As containers
and microservices gain the ability to communicate directly, how should they
locate their peers or each other?

We invite your comments on all aspects of the proposal, here and in
the document.


https://docs.google.com/document/d/1zQJqIEk4ldHH5iE5zat_oKIK8Ejogkgd_lySpg_oV_s/edit?usp=sharing

Jason Sherron on behalf of the working group