Re: CF Networking -- Seeking clarity on today's implementation


Amit Kumar Gupta
 

Hey Ravi, great questions. You're right, all these details may not be well
documented. Out of curiosity, where did you look for documentation, and
where did you find out what you've currently come to know?

Responses inline.

On Wednesday, December 16, 2015, ravi malhotra <ravi.malhotra(a)bnymellon.com>
wrote:

I did not find detailed documentation so I have created a set of
assumptions below on how CF Networking works. Are these correct? Can I get
answers to the questions? Thank you!

High Level Architecture/Scaling:
1. APP Containers and CF Routers are created/destroyed dynamically to
support application needs.

Depends what you mean by dynamically. Users of the platform (people who
push apps) can scale apps via command line "cf scale myapp -i 4" (for 4
instances) or with an application manifest. That's true of open source CF.
Some official vendors (e.g. Pivotal) offer application monitoring and
autoscaling add-on services.

Routers are part of the platform and are managed by an operator. These are
deployed using BOSH and are scaled up via a manifest change (BOSH manifest
though, not CF). There are no dynamic solutions for this in OSS or vendor
solutions at this time that I know of.


2. A Router can LB to any DEA in the environment (or, are there
Availability Zones which prescribe sets of Routers and DEAs?)

Yes, any DEA. The next generation CF backend "Diego" is the same way,
routers can LB to any Diego cell.


3. DEAs cannot talk directly to each other; APP1 to APP2 communication
must go through a Router?

They can, if you configure your application security groups to allow it.
You would still need to know IP and port of the container you're trying to
reach. That said, a brand new project is spinning up (or has already) to
solve container to container networking for containers on Diego cells.


4. If I deploy my own LB solution -- how do I dynamically update Router
addresses in my LB (as Routers are created/destroyed)?

If you use an AWS ELB, this is handled by BOSH. If you're deploying
something like HAProxy as part of your CF deployment, the manifest where
you declare the desire to scale up routers can also configure HAProxy to
know about them. Most BOSH manifest generation tooling automatically
handles making sure the HAProxy config gets the right data.



Communication from Router to App:
1. Router can use some algorithm (like round-robin) to direct traffic to a
DEA.

Yes.


2. Router to DEA traffic: is there an overlay network? or are we just
utilizing the native network?

Native. You could presumably have an overlay network, but not necessary.


3. Router to DEA traffic: is the Router just changing the destination
address of the request to the address of the DEA and forwarding the request
with the source address intact?

Check with the Routing team PM, Shannon (cc'd)


4. Router to DEA traffic: let's say the Router dies half way through; can
we mirror state to another Router?

What sort of state?


5. If a Router dies – all the DEAs can still be accessed via other
Routers; is this right?

Yes.



Communication from point of view of App/Container:
1. An APP (container) cannot directly talk to another APP (container) even
in the same DEA. This communication must go through a Router. Is this
accurate?

I interpreted your previous question about DEA/Diego cell to APP as meaning
APP to APP, so see previous response. Or did I misinterpret your previous
question?


2. The container is in a Network Name Space which is bridged to a Linux
Bridge that then joins to physical NIC.

I would check with the Garden PMs (garden is the containerizer used in CF),
Will and Julz.


3. Containers are isolated from each other because they are in different
Name Spaces and because of IPTables rules.

Yes, standard Linux container technology, cgroups and namespaces.


3. IPTables rules allow the container to communicate with all Routers.
4. IPTables rules bar the container from directly talking to anything that
is not a Router.

Again, check with Garden PMs.



East-West traffic between Containers:
1. E-W traffic must go through a Router.
2. APP1 will seek out a Router (which one?)
3. The Router will direct the request to APP2 on some DEA using some
algorithm (say, round-robin).
4. The reverse traffic from APP2 to APP1 would need to be NATTED to the
Router address. Also, we need a destination NAT. Not sure how the NAT
function would do this work.

Not sure what you mean by East-West. Request to app1.my-domain.com
typically has DNS resolve to an upstream LB. LB routes traffic to routers,
although you could have DNS resolve directly to routers if you want to
expose routers externally. Routers then balance traffic to apps. I believe
the response returns via the router, again, check with Shannon.



Management:
1. Is there ability to define network policy in WARDEN to shut an APP?
2. We may want to define policy based on bandwidth usage.

May be upcoming in Garden.


3. Can we configure QoS bits on an application?

Can you elaborate?



Troubleshooting:
1. Is there a promiscuous APP on a container that can sniff all traffic so
we can troubleshoot?
2. Use case for above: let's say an APP appears to freeze -- having a
packet capture from the DEA node could help diagnose the problem.

As an app developer using Diego backend, you can SSH into container (unless
permissions restricted by play for operator or space manager). As an
operator you can SSH onto DEA or Diego cell itself.



Performance:
1. When is a new CF Router instance spun up? Can I set up a rule in BOSH
to spin up new router when a certain traffic threshold is exceeded?
2. Similarly, when are new APP instances spun up?

See answers to first couple questions.


3. Is there any performance data available on the CF Router?

Shannon can give you a more comprehensive answer. I know they did some perf
tests when using Routers for SSL termination. Other high request tests I've
seen have not exposed router as bottleneck, rather some conntrack parameter
settings in Ubuntu, on the DEA. These have since been addressed.



DEA:
1. can DEA's be multihomed on Public and private networks?

Yes.


2. the BOSH agent on each DEA – what are all its functions?

Big question. The agent is in all stem cells, not just DEA. BOSH director
communicates with it to tell the VM what to do. Looking at the agent client
interface might be a helpful start.


https://github.com/cloudfoundry/bosh-agent/blob/master/agentclient/agent_client_interface.go


Is it collecting health data used by the router in the LB decision?

No.



Packet walk (please include LB and overlay technologies involved):
1. From App to App within a droplet?

What do you mean within a droplet? Do you mean instances of the same
application?


2. From App to App between droplets on the same host?
3. From App to App between droplets on different hosts?
4. From App to App between Availability Zones? (is this allowed?)

This is allowed.


5. From web server (outside CF environment) to App.

IP Addressing:
1. The containers all take addresses from a NATTED range (say,
10.254.0.0/16). Don’t I also need to NAT my source address? Example, I am
coming from an Apache web server to a CF App. The source address of the
Apache web server cannot be from the 10.254.0.0/16 range (if it were, we
would need to NAT the source).
2. Are the container addresses further subnetted (say, /24 per host?)

IP Multicast: Assuming there is no requirement for IP multicast in this
space.

Details: Commands to check which containers are up? What are their
addresses?

CF is unique amongst platforms that containerizer and schedule workloads in
that it goes beyond this, and puts applications and routes as first
class; containers and IPs can usually be safely ignored as an
implementation detail. It's possible to get the information you
mentioned as an operator via the Diego BBS API, but depending on the
problem your trying to solve, this may not be the most relevant data.

In fact, even Diego abstracts containers into long running processes and
tasks. When running with a Windows Diego cells (as opposed to Linux) the
notion of container obviously doesn't translate into namespaces and cgroups.

At any rate, you can see the Diego BBS client interface here:
https://github.com/cloudfoundry-incubator/bbs/blob/master/client.go




Which DEAs a router knows about? What tcp sessions are active? Where can I
find the detailed documentation?

DEAs don't know about routers. Currently, DEAs broadcast application routes
over a message bus, routers subscribe to the channel. This may change in
the future with Diego cells directly talking to the routing tier over HTTP
to populate the routing tables.

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.