Date
1 - 9 of 9
container restart on logout
Graham Bleach
Hi Stefan,
On 23 December 2016 at 13:52, Stefan Mayr <stefan(a)mayr-stefan.de> wrote:
proposal, so I'll let him respond :)
update) or fail, so apps need to be able to cope with it. If you're
not comfortable with downtime then the app probably shouldn't be
single instance.
For my naive "loop through all the app instances" script I'd be
inclined to check that the restarted instance was healthy again before
moving onto the next one.
happens to a running task when the cell gets destroyed and therefore
if there's some reasonable upper bound on how long a task should take
to complete.
--
Technical Architect
Government Digital Service
On 23 December 2016 at 13:52, Stefan Mayr <stefan(a)mayr-stefan.de> wrote:
Am 23.12.2016 um 10:36 schrieb Graham Bleach:I think this was in response to Dan's immutability enforcementOn 23 December 2016 at 09:21, Daniel JonesHow to cope with the following issues?
<daniel.jones(a)engineerbetter.com> wrote:Hmm, here's an idea that I haven't through and so is probably rubbish...An idea we've been kicking around is to ensure that app instance
How about an immutability enforcer? Recursively checksum the expanded
contents of a droplet, and kill-with-fire anything that doesn't match it.
It'd need to be optional for folks storing ephemeral data on their ephemeral
disk, and a non-invasive (ie no changes to CF components) implementation
would depend on `cf ssh` or a chained buildpack, but maybe that's a nice
compromise that could be quicker to develop than waiting for mainline code
changes to CF?
containers never live longer than a certain time (eg. 3, 6, 12 or 24
hours).
This would ensure that we'd catch cases where apps weren't able to
cope with being rescheduled to different cells. It'd also strongly
discourage manual tweaks via ssh. It'd probably be useful for people
deploying apps to be able to initiate an aggressive version of this
behaviour to run in their testing pipelines, prior to production
deployment, to catch regressions in keeping state in app instances.
There's a naive implementation in my head that would work fine on
smaller installations by looping through app instances returned by the
API and restarting them.
Cheers,
Graham
Temporary data: some software still uses sessions, file uploads or
caches which are buffered or written to disk (Java/Tomcat, PHP, ...).
While it is okay to loose this data when a container is restarted (after
you had some time to work with this data) it becomes a problem when
every write can cause the recreation of this container. How should an
upload form work if every upload can kill the container? I'm only
refering the processing of the upload - not permanently storing it.
proposal, so I'll let him respond :)
Single instances: recreating app containers when there are more than twoApp instances go away when the cells get replaced (eg. stemcell
should not cause to many issues. But if there is only one instance you
have two choices:
- kill the running container and start a new one -> short downtime
- start a second instance and kill the first one afterwards -> problem
if the application is only allowed to run with one instance (singleton).
update) or fail, so apps need to be able to cope with it. If you're
not comfortable with downtime then the app probably shouldn't be
single instance.
For my naive "loop through all the app instances" script I'd be
inclined to check that the restarted instance was healthy again before
moving onto the next one.
One-shot tasks: a slight variation of the single instance problem andTasks feel less safe to interrupt than app instances. I'm unclear what
the question if you are allowed to restart a oneshot task
happens to a running task when the cell gets destroyed and therefore
if there's some reasonable upper bound on how long a task should take
to complete.
--
Technical Architect
Government Digital Service
Stefan Mayr
Am 23.12.2016 um 10:36 schrieb Graham Bleach:
Temporary data: some software still uses sessions, file uploads or
caches which are buffered or written to disk (Java/Tomcat, PHP, ...).
While it is okay to loose this data when a container is restarted (after
you had some time to work with this data) it becomes a problem when
every write can cause the recreation of this container. How should an
upload form work if every upload can kill the container? I'm only
refering the processing of the upload - not permanently storing it.
Single instances: recreating app containers when there are more than two
should not cause to many issues. But if there is only one instance you
have two choices:
- kill the running container and start a new one -> short downtime
- start a second instance and kill the first one afterwards -> problem
if the application is only allowed to run with one instance (singleton).
One-shot tasks: a slight variation of the single instance problem and
the question if you are allowed to restart a oneshot task
Happy holidays,
Stefan
On 23 December 2016 at 09:21, Daniel JonesHow to cope with the following issues?
<daniel.jones(a)engineerbetter.com> wrote:Hmm, here's an idea that I haven't through and so is probably rubbish...An idea we've been kicking around is to ensure that app instance
How about an immutability enforcer? Recursively checksum the expanded
contents of a droplet, and kill-with-fire anything that doesn't match it.
It'd need to be optional for folks storing ephemeral data on their ephemeral
disk, and a non-invasive (ie no changes to CF components) implementation
would depend on `cf ssh` or a chained buildpack, but maybe that's a nice
compromise that could be quicker to develop than waiting for mainline code
changes to CF?
containers never live longer than a certain time (eg. 3, 6, 12 or 24
hours).
This would ensure that we'd catch cases where apps weren't able to
cope with being rescheduled to different cells. It'd also strongly
discourage manual tweaks via ssh. It'd probably be useful for people
deploying apps to be able to initiate an aggressive version of this
behaviour to run in their testing pipelines, prior to production
deployment, to catch regressions in keeping state in app instances.
There's a naive implementation in my head that would work fine on
smaller installations by looping through app instances returned by the
API and restarting them.
Cheers,
Graham
Temporary data: some software still uses sessions, file uploads or
caches which are buffered or written to disk (Java/Tomcat, PHP, ...).
While it is okay to loose this data when a container is restarted (after
you had some time to work with this data) it becomes a problem when
every write can cause the recreation of this container. How should an
upload form work if every upload can kill the container? I'm only
refering the processing of the upload - not permanently storing it.
Single instances: recreating app containers when there are more than two
should not cause to many issues. But if there is only one instance you
have two choices:
- kill the running container and start a new one -> short downtime
- start a second instance and kill the first one afterwards -> problem
if the application is only allowed to run with one instance (singleton).
One-shot tasks: a slight variation of the single instance problem and
the question if you are allowed to restart a oneshot task
Happy holidays,
Stefan
Graham Bleach
On 23 December 2016 at 09:21, Daniel Jones
<daniel.jones(a)engineerbetter.com> wrote:
containers never live longer than a certain time (eg. 3, 6, 12 or 24
hours).
This would ensure that we'd catch cases where apps weren't able to
cope with being rescheduled to different cells. It'd also strongly
discourage manual tweaks via ssh. It'd probably be useful for people
deploying apps to be able to initiate an aggressive version of this
behaviour to run in their testing pipelines, prior to production
deployment, to catch regressions in keeping state in app instances.
There's a naive implementation in my head that would work fine on
smaller installations by looping through app instances returned by the
API and restarting them.
Cheers,
Graham
<daniel.jones(a)engineerbetter.com> wrote:
Hmm, here's an idea that I haven't through and so is probably rubbish...An idea we've been kicking around is to ensure that app instance
How about an immutability enforcer? Recursively checksum the expanded
contents of a droplet, and kill-with-fire anything that doesn't match it.
It'd need to be optional for folks storing ephemeral data on their ephemeral
disk, and a non-invasive (ie no changes to CF components) implementation
would depend on `cf ssh` or a chained buildpack, but maybe that's a nice
compromise that could be quicker to develop than waiting for mainline code
changes to CF?
containers never live longer than a certain time (eg. 3, 6, 12 or 24
hours).
This would ensure that we'd catch cases where apps weren't able to
cope with being rescheduled to different cells. It'd also strongly
discourage manual tweaks via ssh. It'd probably be useful for people
deploying apps to be able to initiate an aggressive version of this
behaviour to run in their testing pipelines, prior to production
deployment, to catch regressions in keeping state in app instances.
There's a naive implementation in my head that would work fine on
smaller installations by looping through app instances returned by the
API and restarting them.
Cheers,
Graham
Daniel Jones
Hmm, here's an idea that I haven't through and so is probably rubbish...
How about an immutability enforcer? Recursively checksum the expanded
contents of a droplet, and kill-with-fire anything that doesn't match it.
It'd need to be optional for folks storing ephemeral data on their
ephemeral disk, and a non-invasive (ie no changes to CF components)
implementation would *depend* on `cf ssh` or a chained buildpack, but maybe
that's a nice compromise that could be quicker to develop than waiting for
mainline code changes to CF?
Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists
On Thu, Dec 22, 2016 at 10:01 AM, David Illsley <davidillsley(a)gmail.com>
wrote:
How about an immutability enforcer? Recursively checksum the expanded
contents of a droplet, and kill-with-fire anything that doesn't match it.
It'd need to be optional for folks storing ephemeral data on their
ephemeral disk, and a non-invasive (ie no changes to CF components)
implementation would *depend* on `cf ssh` or a chained buildpack, but maybe
that's a nice compromise that could be quicker to develop than waiting for
mainline code changes to CF?
Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists
On Thu, Dec 22, 2016 at 10:01 AM, David Illsley <davidillsley(a)gmail.com>
wrote:
I have no idea why the idea hasn't be implemented, but pondering it, it
seems like it's hard to do because of the cases you mention. Some people
need a policy that 'app teams won’t abuse it by creating app snowflakes',
and in some (most?) cases you need the flexibility to do debugging as you
mentioned.
I think it's possible to combine the SSH authorized events, and the
instance uptime details from the API to build audit capability - identify
instances which have been SSH'd to and not recycled within some time period
(eg 1 hour). You could have either some escalations process to get a human
to do something about it (in case there's a reason an hour wasn't enough),
or more brutally, give the audit code the ability to do a restart instance.
On Tue, Dec 20, 2016 at 12:48 PM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:Plus one!
An implementation whereby the recycling behaviour can be feature-flagged
by space or globally would be nice, so you could turn it off whilst
debugging in a space, and then re-enable it when you've finished debugging
via a series of short-lived SSH sessions.
Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153 <07980%20009153>
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists
On Tue, Dec 20, 2016 at 8:06 AM, DHR <lists(a)dhrapson.com> wrote:Thanks Jon. The financial services clients I have worked with would also
like the ability to turn on ‘cf ssh’ support in production, safe in the
knowledge that app teams won’t abuse it by creating app snowflakes.
I see that the audit trail mentioned in the thread you posted have been
implemented in ‘cf events’. Like this:
time event actor
description
2016-12-19T16:20:36.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T15:30:33.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T12:00:53.00+0000 audit.app.ssh-authorized user index: 0
That said: I still think the container recycle functionality, available
as say a feature flag, would be really appreciated by the large enterprise
community.On 19 Dec 2016, at 18:25, Jon Price <jon.price(a)intel.com> wrote:seen any discussion about it in quite some time. Here is one of the
This is something that has been on our wishlist as well but I haven't
original discussions about it: https://lists.cloudfoundry.org
/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/GCFOOYRU
T5ARBMUHDGINID46KFNORNYM/sort of recycling policy for containers in some of our more secure
It would go a long way with our security team if we could have some
environments.
Jon Price
Intel Corporation
David Illsley <davidillsley@...>
I have no idea why the idea hasn't be implemented, but pondering it, it
seems like it's hard to do because of the cases you mention. Some people
need a policy that 'app teams won’t abuse it by creating app snowflakes',
and in some (most?) cases you need the flexibility to do debugging as you
mentioned.
I think it's possible to combine the SSH authorized events, and the
instance uptime details from the API to build audit capability - identify
instances which have been SSH'd to and not recycled within some time period
(eg 1 hour). You could have either some escalations process to get a human
to do something about it (in case there's a reason an hour wasn't enough),
or more brutally, give the audit code the ability to do a restart instance.
On Tue, Dec 20, 2016 at 12:48 PM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:
seems like it's hard to do because of the cases you mention. Some people
need a policy that 'app teams won’t abuse it by creating app snowflakes',
and in some (most?) cases you need the flexibility to do debugging as you
mentioned.
I think it's possible to combine the SSH authorized events, and the
instance uptime details from the API to build audit capability - identify
instances which have been SSH'd to and not recycled within some time period
(eg 1 hour). You could have either some escalations process to get a human
to do something about it (in case there's a reason an hour wasn't enough),
or more brutally, give the audit code the ability to do a restart instance.
On Tue, Dec 20, 2016 at 12:48 PM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:
Plus one!
An implementation whereby the recycling behaviour can be feature-flagged
by space or globally would be nice, so you could turn it off whilst
debugging in a space, and then re-enable it when you've finished debugging
via a series of short-lived SSH sessions.
Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153 <07980%20009153>
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists
On Tue, Dec 20, 2016 at 8:06 AM, DHR <lists(a)dhrapson.com> wrote:Thanks Jon. The financial services clients I have worked with would also
like the ability to turn on ‘cf ssh’ support in production, safe in the
knowledge that app teams won’t abuse it by creating app snowflakes.
I see that the audit trail mentioned in the thread you posted have been
implemented in ‘cf events’. Like this:
time event actor
description
2016-12-19T16:20:36.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T15:30:33.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T12:00:53.00+0000 audit.app.ssh-authorized user index: 0
That said: I still think the container recycle functionality, available
as say a feature flag, would be really appreciated by the large enterprise
community.On 19 Dec 2016, at 18:25, Jon Price <jon.price(a)intel.com> wrote:seen any discussion about it in quite some time. Here is one of the
This is something that has been on our wishlist as well but I haven't
original discussions about it: https://lists.cloudfoundry.org
/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/GCFOOY
RUT5ARBMUHDGINID46KFNORNYM/sort of recycling policy for containers in some of our more secure
It would go a long way with our security team if we could have some
environments.
Jon Price
Intel Corporation
Daniel Jones
Plus one!
An implementation whereby the recycling behaviour can be feature-flagged by
space or globally would be nice, so you could turn it off whilst debugging
in a space, and then re-enable it when you've finished debugging via a
series of short-lived SSH sessions.
Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists
toggle quoted message
Show quoted text
An implementation whereby the recycling behaviour can be feature-flagged by
space or globally would be nice, so you could turn it off whilst debugging
in a space, and then re-enable it when you've finished debugging via a
series of short-lived SSH sessions.
Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists
On Tue, Dec 20, 2016 at 8:06 AM, DHR <lists(a)dhrapson.com> wrote:
Thanks Jon. The financial services clients I have worked with would also
like the ability to turn on ‘cf ssh’ support in production, safe in the
knowledge that app teams won’t abuse it by creating app snowflakes.
I see that the audit trail mentioned in the thread you posted have been
implemented in ‘cf events’. Like this:
time event actor
description
2016-12-19T16:20:36.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T15:30:33.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T12:00:53.00+0000 audit.app.ssh-authorized user index: 0
That said: I still think the container recycle functionality, available as
say a feature flag, would be really appreciated by the large enterprise
community.On 19 Dec 2016, at 18:25, Jon Price <jon.price(a)intel.com> wrote:seen any discussion about it in quite some time. Here is one of the
This is something that has been on our wishlist as well but I haven't
original discussions about it: https://lists.cloudfoundry.
org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/
GCFOOYRUT5ARBMUHDGINID46KFNORNYM/of recycling policy for containers in some of our more secure environments.
It would go a long way with our security team if we could have some sort
Jon Price
Intel Corporation
DHR
Thanks Jon. The financial services clients I have worked with would also like the ability to turn on ‘cf ssh’ support in production, safe in the knowledge that app teams won’t abuse it by creating app snowflakes.
I see that the audit trail mentioned in the thread you posted have been implemented in ‘cf events’. Like this:
time event actor description
2016-12-19T16:20:36.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T15:30:33.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T12:00:53.00+0000 audit.app.ssh-authorized user index: 0
That said: I still think the container recycle functionality, available as say a feature flag, would be really appreciated by the large enterprise community.
toggle quoted message
Show quoted text
I see that the audit trail mentioned in the thread you posted have been implemented in ‘cf events’. Like this:
time event actor description
2016-12-19T16:20:36.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T15:30:33.00+0000 audit.app.ssh-authorized user index: 0
2016-12-19T12:00:53.00+0000 audit.app.ssh-authorized user index: 0
That said: I still think the container recycle functionality, available as say a feature flag, would be really appreciated by the large enterprise community.
On 19 Dec 2016, at 18:25, Jon Price <jon.price(a)intel.com> wrote:
This is something that has been on our wishlist as well but I haven't seen any discussion about it in quite some time. Here is one of the original discussions about it: https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/GCFOOYRUT5ARBMUHDGINID46KFNORNYM/
It would go a long way with our security team if we could have some sort of recycling policy for containers in some of our more secure environments.
Jon Price
Intel Corporation
Jon Price
This is something that has been on our wishlist as well but I haven't seen any discussion about it in quite some time. Here is one of the original discussions about it: https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/GCFOOYRUT5ARBMUHDGINID46KFNORNYM/
It would go a long way with our security team if we could have some sort of recycling policy for containers in some of our more secure environments.
Jon Price
Intel Corporation
It would go a long way with our security team if we could have some sort of recycling policy for containers in some of our more secure environments.
Jon Price
Intel Corporation
DHR
Hi,
Last year when ‘cf ssh’ functionality was being discussed, I’m pretty sure that the concept of automatically restarting containers following an SSH session was discussed.
It was to protect against creating app container snowflakes.
I’m fairly sure that protection hasn’t been introduced yet: I tested cf ssh-ing into a PCFDEV container today, writing a file & was able to log back in to the container later and see that it was still present.
Is this feature or any other app container snowflake protection still planned?
I couldn’t see anything in the Diego backlog (https://www.pivotaltracker.com/n/projects/1003146 <https://www.pivotaltracker.com/n/projects/1003146>).
Thanks
Dave
Last year when ‘cf ssh’ functionality was being discussed, I’m pretty sure that the concept of automatically restarting containers following an SSH session was discussed.
It was to protect against creating app container snowflakes.
I’m fairly sure that protection hasn’t been introduced yet: I tested cf ssh-ing into a PCFDEV container today, writing a file & was able to log back in to the container later and see that it was still present.
Is this feature or any other app container snowflake protection still planned?
I couldn’t see anything in the Diego backlog (https://www.pivotaltracker.com/n/projects/1003146 <https://www.pivotaltracker.com/n/projects/1003146>).
Thanks
Dave