Re: container restart on logout


Graham Bleach
 

Hi Stefan,

On 23 December 2016 at 13:52, Stefan Mayr <stefan(a)mayr-stefan.de> wrote:
Am 23.12.2016 um 10:36 schrieb Graham Bleach:
On 23 December 2016 at 09:21, Daniel Jones
<daniel.jones(a)engineerbetter.com> wrote:
Hmm, here's an idea that I haven't through and so is probably rubbish...

How about an immutability enforcer? Recursively checksum the expanded
contents of a droplet, and kill-with-fire anything that doesn't match it.
It'd need to be optional for folks storing ephemeral data on their ephemeral
disk, and a non-invasive (ie no changes to CF components) implementation
would depend on `cf ssh` or a chained buildpack, but maybe that's a nice
compromise that could be quicker to develop than waiting for mainline code
changes to CF?
An idea we've been kicking around is to ensure that app instance
containers never live longer than a certain time (eg. 3, 6, 12 or 24
hours).

This would ensure that we'd catch cases where apps weren't able to
cope with being rescheduled to different cells. It'd also strongly
discourage manual tweaks via ssh. It'd probably be useful for people
deploying apps to be able to initiate an aggressive version of this
behaviour to run in their testing pipelines, prior to production
deployment, to catch regressions in keeping state in app instances.

There's a naive implementation in my head that would work fine on
smaller installations by looping through app instances returned by the
API and restarting them.

Cheers,
Graham
How to cope with the following issues?

Temporary data: some software still uses sessions, file uploads or
caches which are buffered or written to disk (Java/Tomcat, PHP, ...).
While it is okay to loose this data when a container is restarted (after
you had some time to work with this data) it becomes a problem when
every write can cause the recreation of this container. How should an
upload form work if every upload can kill the container? I'm only
refering the processing of the upload - not permanently storing it.
I think this was in response to Dan's immutability enforcement
proposal, so I'll let him respond :)

Single instances: recreating app containers when there are more than two
should not cause to many issues. But if there is only one instance you
have two choices:
- kill the running container and start a new one -> short downtime
- start a second instance and kill the first one afterwards -> problem
if the application is only allowed to run with one instance (singleton).
App instances go away when the cells get replaced (eg. stemcell
update) or fail, so apps need to be able to cope with it. If you're
not comfortable with downtime then the app probably shouldn't be
single instance.

For my naive "loop through all the app instances" script I'd be
inclined to check that the restarted instance was healthy again before
moving onto the next one.

One-shot tasks: a slight variation of the single instance problem and
the question if you are allowed to restart a oneshot task
Tasks feel less safe to interrupt than app instances. I'm unclear what
happens to a running task when the cell gets destroyed and therefore
if there's some reasonable upper bound on how long a task should take
to complete.

--
Technical Architect
Government Digital Service

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.