SSH access to CF app instances on Diego


Gwenn Etourneau
 

Oh I think you are right James ..
Do we have a google docs discussing around that ?

On Fri, Jul 31, 2015 at 2:39 PM, James Bayer <jbayer(a)pivotal.io> wrote:

gwenn, i suspect guillaume is referencing the platform policies around ssh
and container lifecycle which has not had more discussion that i'm aware
of. i think eric has received some additional feedback from some people,
but i have not seen what it is either or an outline of how the defaults and
configuration options will be exposed.

On Thu, Jul 30, 2015 at 6:01 PM, Gwenn Etourneau <getourneau(a)pivotal.io>
wrote:

Guillaume,

Not sure is related but the ssh have been already implemented and is
available on diego and cf-release.


On Thu, Jul 30, 2015 at 10:35 PM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Eric,

The CAB minutes [1] mentionned you were still looking for feedback from
the community on the policy for altered instances, but this thread seems
silent for a while.

Not sure you had seen my email and suggestion below for a way to
quarantine the altered instances (beyond the per-space restart policy
configuration). Such quarantine request might be a good place to include
option to ask for the quarantine instances to be excluded from gorouter
traffic.

[1]
http://www.activestate.com/blog/2015/07/cloud-foundry-advisory-board-meeting-2015-july

Regards,

Guillaume.

On Fri, Jul 3, 2015 at 3:56 PM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Hi,

please find my feedback to this thread

*short version:*
1- need preserve good CF experience with HTTP only (direct SSH flow is
still blocked and a pain in many organisations) => +1 to preserve "cf
files" or fine tune diego plug to have ssh over HTTP to work out of the box
2- default "recycle tainted containers by default" policy seems good to
me
3- needs to be completed with more control of the recycling policy (UX
such as "quarantine" or GAE "lock/unlock" )
4- development use-cases need to be better supported (dev/prod parity)
not sure ssh/scp is the right path though

*long version:*

*1- cf files and ssh over HTTP*

As previously mentionned into [1], CF exposing apis over HTTP api made
a great job to be easily consummed through HTTP proxies that some companies
still use, making CF experience seemless to consumme public paas, or
private paas among corporate entities. It seems important to me to preserve
good CF experience with HTTP only.

If SSH interactive access, scp and port forwarding become the
mainstream solution to operate and troubleshoot apps (supporting "cf
files", replacement for the previous DEBUG and CONSOLE ports), it will be
useful for users behind such firewalls to be able to configure diego ssh
plugin to use HTTP/SOCKS proxies to reach public CF instances. As the diego
ssh cli plugin supports using the regular local host ssh binaries, this may
potentially be done by tweaking the .ssh config file to add flags
associated to host ssh.${domain} to go through proxies (possibly double
tunnels as described into [2]). However, for new users in such network
context, especially on windows operating system, the set up work before
using a CF public instance starts to add up?

*2- default "recycle tainted containers by default" seems good to me*

Given that apps deployed on CF comply to 12 factor apps, there instance
may be restarted at anytime (e.g. during a CF new release deployment or
stemcell upgrade). So the default policy "recycle tainted containers by
default" is not a surprise.

*3- need to be completed with more control of the recycling policy (UX
such as "quarantine" or GAE "lock/unlock" )*

There are some specific use-cases where the "recycle tainted containers
by default" policy would be problematic when running applications in
production:

An application instance is malfunctionning (e.g. hanging) and an
interactive debugging is necessary. The app-ops ssh into the container and
starts taking some diagnostic steps (e.g sending kill -SIGTERM signals to
take thread dumps, or locally changes log levels).

If ever the ssh connection breaks/timeout, the "recycle tainted
containers by default, preventing the current diagnostc to complete.

Another similar use case: a production application is suspected to be
compromised by an attacker. App-ops need to capture evidences and
understand better how the abuse was done. There isn't enough information in
streamed logs, and there is a need to get into the container to inspect the
ephemeral FS and the processes and memory. This may require more than one
simultanenous SSH connection, and may span on multiple hours

In both use-cases above, while the application is 12 factor compliant
and the "recycle tainted containers by default" policy would be opted in on
the corresponding space, there would be a need to transiently turn the mode
off.

In term of user experience, this may appear as an explicit user request
to "quarantine" the tainted app instances (or the whoe app) so that CF does
not attempt to restart them. Or it may appear as the google app engine
"lock/unlock"

a call to a new "unlock" command to a CF app instance would be
necessary to get SSH access to it. CF then considers this instance as
"tained"/untrusted, as it may have deviated from the pushed content, and
does not act to it anymore (i.e. does not monitor its bound $PORT or root
process exit, which may be handy to diagnose it as wish). When the "lock"
command is requested on this instance, Cf destroys this tainted instance,
and recreates a fresh new "trusted" one.

*4- development use-cases need to be better supported (dev/prod parity)
not sure ssh/scp is the right path though*

I agree with James Myers that development use-cases should be better
supported.

First, CF should strive to support dev-prod parity [4]. However
currently, there is not anymore a version of CF that a developper can run
on his laptop (e.g. when doing offline development during commute) that
would behave like prod and embed buildpacks. There used to have "CF on a
single VM". Heroku or GAE have emulators. Cloud rocker [5] is close, but it
still takes 10s or more to have changes made on the app be reflected into a
running app.

There are some legitimate use cases during development for modifying
sources of the application and have those changes be taken in effect
immediately. Lots of app development framework supports those development
modes (even those that promote test-driven practices), and getting a fast
feedback is important. Having dev-prod parity means supporting these use
cases while preserving prod behavior (having the VCAP_SERVICES and
VCAP_APPLICATION and the buildpack processing applied on the same stack
(cflinux2)). Being able to run offline would be even better.

I however believe that providing SSH/SCP access to change the file
system to a running app instance may not be the appropriate response, given
the FS and the app instance is still ephemeral. Who would want to modify
files that could be lost at any time (e.g. a stemcell upgrade ) ?

I'd rather see value in further exploring the ideas layed out by James
Bayer into [5] e.g. as a form of a git repo populated with the
/home/vcap/app subdir, that developers could clone, push to, and have the
instance epheremal FS updated with pushed changes.

This may be combined with a cloudrocker mechanism as to work with a
fully offline mode when this is required.

[1]
https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/OavSBIhU_xQ/wJrT08iHfJ8J
[2] http://proxytunnel.sourceforge.net/paper.php
[3]
https://cloud.google.com/appengine/docs/managed-vms/host-env#changing_management
[4] http://12factor.net/dev-prod-parity
[5]
https://docs.google.com/document/d/1_C3OWS6giWx4JL_IL9YLA6jcppyQLVD-YjR0GeA8Z0s/edit#heading=h.toypuu5pxh65



On Thu, Jul 2, 2015 at 10:18 PM, James Myers <jmyers(a)pivotal.io> wrote:

I have to agree with Matt on this one. I feel that the recycling of
containers is a very anti-developer default. When you approach Cloud
Foundry from the perspective of running production applications the recycle
policy makes complete sense. However, I feel that this misses out on one of
the massive benefits/use cases of Cloud Foundry, what it offers to the
development process.

From a security stand point, if you can ssh into a container, it means
you have write access to the application in CloudFoundry. Thus you can
already push new bits/change the application in question. All of the
"papertrail" functionality around pushing/changing applications exists for
SSH as well (we record events, output log lines, make it visible to users
that action was taken on the application), and thus concerned operators
would be able to determine if someone modifying the application in question.

Therefore I'm lost on how this is truly the most secure default. If we
are really going by the idea that all defaults should be the most secure,
ssh should be disabled by default.

As a developer, I can see many times in which I would want to be able
to ssh into my container and change my application as part of a
troubleshooting process. Using BOSH as an example, CF Devs constantly ssh
into VMs and change the processes running on them in order to facilitate
development. BOSH does not reap the VM and redeploy a new instance when you
have closed the SSH session. Once again this is largely due to the fact
that if you have SSH access, you can already perform the necessary actions
to change the application through different means.

Another huge hindrance to development, is that the recycling policy is
controlled by administrators. It is not something that normal users can
control, even though we allow the granularity of enabling/disabling SSH
completely to the end user. This seems counterintuitive.

I feel that a better solution would be to provide the user with some
knowledge of which instances may be tainted, and then allowing them to opt
into a policy which will reap tainted containers. This provides users with
clear insight that their application instance may be a snowflake (and that
they may want to take action), while also allowing normal behavior with
regards to SSH access to containers.

To summarize, by enabling the recycling policy by default we not only
produce extremely unusual behavior / workflows for developers, we are also
minimizing the developer-friendliness of CF in general. This mixed with the
fact that as a user I cannot even control this policy, leads me to believe
that as a default recycling should be turned off as it provides the most
cohesive and friendly user experience.

On Mon, Jun 29, 2015 at 9:14 AM, John Wong <gokoproject(a)gmail.com>
wrote:

after executing a command, concluding an interactive session, or
copying a file into an instance, that instance will be restarted.

How does it monitor the behavior? Is there a list of commands
whitelisted? I am curious because I am trying to find out what the
whitelist contain. Also is it at the end of the bosh ssh APP_NAME session?
What if two users are there simultaneously?

Thanks.

On Mon, Jun 29, 2015 at 5:49 AM, Dieu Cao <dcao(a)pivotal.io> wrote:

I think with the CLI we could add clarifying messaging when using
ssh what the current policy around recycling is.
Eric, what do you think about calling it the "recycling" policy,
enabled by default? =D

-Dieu


On Sat, Jun 27, 2015 at 3:42 AM, Matthew Sykes <
matthew.sykes(a)gmail.com> wrote:

Depends on your role and where your app is in the deployment
pipeline. Most of the scenarios I envisioned were for the tail end of
development where you need to poke around to debug and figure out those
last few problems.

For example, Ryan Morgan was saying that the Cloud Foundry plugin
for eclipse is going to be using the ssh support in diego to enable debug
of application instances in the context of a buildpack deployed app. This
is aligned with other requirements I've heard from people working on dev
tools.

As apps reach production, I would hope that interactive ssh is
disabled entirely on the prod space leaving only scp in source mode as an
option (something the proxy can do).

Between dev and prod, there's a spectrum, but in general, I either
expect access to be enabled or disabled - not enabled with a suicidal
tendency.

On Thu, Jun 25, 2015 at 10:53 PM, Benjamin Black <bblack(a)pivotal.io
wrote:
matt,

could you elaborate a bit on what you believe ssh access to
instances is for?


b


On Thu, Jun 25, 2015 at 9:29 PM, Matthew Sykes <
matthew.sykes(a)gmail.com> wrote:

My concern is the default behavior.

When I first prototyped this support in February, I never
expected that merely accessing a container would cause it to be terminated.
As we can see from Jan's response, it's completely unexpected; many others
have the same reaction.

I do not believe that this behavior should be part of the default
configuration and I do believe the control needs to be at the space level.
I have have already expressed this opinion during Diego retros and at the
runtime PMC meeting.

I honestly believe that if we were talking about applying this
behavior to `bosh ssh` and `bosh scp`, few would even consider running in a
'kill on taint mode' because of how useful it is. We should learn from that.

If this behavior becomes the default, I think our platform will
be seen as moving from opinionated to parochial. That would be unfortunate.


On Thu, Jun 25, 2015 at 6:05 PM, James Bayer <jbayer(a)pivotal.io>
wrote:

you can turn the "restart tainted containers" feature off with
configuration if you are authorized to do so. then using scp to write files
into a container would be persisted for the lifetime of the container even
after the ssh session ends.

On Thu, Jun 25, 2015 at 5:50 PM, Jan Dubois <
jand(a)activestate.com> wrote:

On Thu, Jun 25, 2015 at 5:36 PM, Eric Malm <emalm(a)pivotal.io>
wrote:
after executing a command, concluding an
interactive session, or copying a file into an instance, that
instance will
be restarted.
What is the purpose of being able to copy a file into an
instance if
the instance is restarted as soon as the file has been received?

Cheers,
-Jan
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Matthew Sykes
matthew.sykes(a)gmail.com

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Matthew Sykes
matthew.sykes(a)gmail.com

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev