Re: SSH access to CF app instances on Diego
Hi,
toggle quoted messageShow quoted text
please find my feedback to this thread *short version:* 1- need preserve good CF experience with HTTP only (direct SSH flow is still blocked and a pain in many organisations) => +1 to preserve "cf files" or fine tune diego plug to have ssh over HTTP to work out of the box 2- default "recycle tainted containers by default" policy seems good to me 3- needs to be completed with more control of the recycling policy (UX such as "quarantine" or GAE "lock/unlock" ) 4- development use-cases need to be better supported (dev/prod parity) not sure ssh/scp is the right path though *long version:* *1- cf files and ssh over HTTP* As previously mentionned into [1], CF exposing apis over HTTP api made a great job to be easily consummed through HTTP proxies that some companies still use, making CF experience seemless to consumme public paas, or private paas among corporate entities. It seems important to me to preserve good CF experience with HTTP only. If SSH interactive access, scp and port forwarding become the mainstream solution to operate and troubleshoot apps (supporting "cf files", replacement for the previous DEBUG and CONSOLE ports), it will be useful for users behind such firewalls to be able to configure diego ssh plugin to use HTTP/SOCKS proxies to reach public CF instances. As the diego ssh cli plugin supports using the regular local host ssh binaries, this may potentially be done by tweaking the .ssh config file to add flags associated to host ssh.${domain} to go through proxies (possibly double tunnels as described into [2]). However, for new users in such network context, especially on windows operating system, the set up work before using a CF public instance starts to add up? *2- default "recycle tainted containers by default" seems good to me* Given that apps deployed on CF comply to 12 factor apps, there instance may be restarted at anytime (e.g. during a CF new release deployment or stemcell upgrade). So the default policy "recycle tainted containers by default" is not a surprise. *3- need to be completed with more control of the recycling policy (UX such as "quarantine" or GAE "lock/unlock" )* There are some specific use-cases where the "recycle tainted containers by default" policy would be problematic when running applications in production: An application instance is malfunctionning (e.g. hanging) and an interactive debugging is necessary. The app-ops ssh into the container and starts taking some diagnostic steps (e.g sending kill -SIGTERM signals to take thread dumps, or locally changes log levels). If ever the ssh connection breaks/timeout, the "recycle tainted containers by default, preventing the current diagnostc to complete. Another similar use case: a production application is suspected to be compromised by an attacker. App-ops need to capture evidences and understand better how the abuse was done. There isn't enough information in streamed logs, and there is a need to get into the container to inspect the ephemeral FS and the processes and memory. This may require more than one simultanenous SSH connection, and may span on multiple hours In both use-cases above, while the application is 12 factor compliant and the "recycle tainted containers by default" policy would be opted in on the corresponding space, there would be a need to transiently turn the mode off. In term of user experience, this may appear as an explicit user request to "quarantine" the tainted app instances (or the whoe app) so that CF does not attempt to restart them. Or it may appear as the google app engine "lock/unlock" a call to a new "unlock" command to a CF app instance would be necessary to get SSH access to it. CF then considers this instance as "tained"/untrusted, as it may have deviated from the pushed content, and does not act to it anymore (i.e. does not monitor its bound $PORT or root process exit, which may be handy to diagnose it as wish). When the "lock" command is requested on this instance, Cf destroys this tainted instance, and recreates a fresh new "trusted" one. *4- development use-cases need to be better supported (dev/prod parity) not sure ssh/scp is the right path though* I agree with James Myers that development use-cases should be better supported. First, CF should strive to support dev-prod parity [4]. However currently, there is not anymore a version of CF that a developper can run on his laptop (e.g. when doing offline development during commute) that would behave like prod and embed buildpacks. There used to have "CF on a single VM". Heroku or GAE have emulators. Cloud rocker [5] is close, but it still takes 10s or more to have changes made on the app be reflected into a running app. There are some legitimate use cases during development for modifying sources of the application and have those changes be taken in effect immediately. Lots of app development framework supports those development modes (even those that promote test-driven practices), and getting a fast feedback is important. Having dev-prod parity means supporting these use cases while preserving prod behavior (having the VCAP_SERVICES and VCAP_APPLICATION and the buildpack processing applied on the same stack (cflinux2)). Being able to run offline would be even better. I however believe that providing SSH/SCP access to change the file system to a running app instance may not be the appropriate response, given the FS and the app instance is still ephemeral. Who would want to modify files that could be lost at any time (e.g. a stemcell upgrade ) ? I'd rather see value in further exploring the ideas layed out by James Bayer into [5] e.g. as a form of a git repo populated with the /home/vcap/app subdir, that developers could clone, push to, and have the instance epheremal FS updated with pushed changes. This may be combined with a cloudrocker mechanism as to work with a fully offline mode when this is required. [1] https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/OavSBIhU_xQ/wJrT08iHfJ8J [2] http://proxytunnel.sourceforge.net/paper.php [3] https://cloud.google.com/appengine/docs/managed-vms/host-env#changing_management [4] http://12factor.net/dev-prod-parity [5] https://docs.google.com/document/d/1_C3OWS6giWx4JL_IL9YLA6jcppyQLVD-YjR0GeA8Z0s/edit#heading=h.toypuu5pxh65
On Thu, Jul 2, 2015 at 10:18 PM, James Myers <jmyers(a)pivotal.io> wrote:
I have to agree with Matt on this one. I feel that the recycling of
|
|