Dear CF community, We are trying to find a way to selectively restart some instances of apps or to restart a specific app on an as needed basis based on some alerts that we receive from our monitoring solution. One option we are considering is to have a self-healing app deployed in CF which will have some REST endpoints exposed which we can call from our alert policies that will perform those actions for us. This self-healing app will essentially have the capabilities of CF CLI for stopping and starting services and instances. This app will also be protected by UAA.
Before we go off and start developing this app, I wanted to check if anyone in the CF community has thought about this approach before and have a solution in place or any ideas to consider.
Thanks, Siva Balan
|
|
Not sure I totally get what you are asking, but `cf restart-app-instance` will restart an instance, so if you have an alert trigger a script, you could script the restart.
Or you could just have the app itself know when it gets into a bad state, presumably it would if it's emitting the metrics to indicate this, and exit. When it exits the platform will just restart the app.
Dan
toggle quoted message
Show quoted text
Dear CF community, We are trying to find a way to selectively restart some instances of apps or to restart a specific app on an as needed basis based on some alerts that we receive from our monitoring solution. One option we are considering is to have a self-healing app deployed in CF which will have some REST endpoints exposed which we can call from our alert policies that will perform those actions for us. This self-healing app will essentially have the capabilities of CF CLI for stopping and starting services and instances. This app will also be protected by UAA.
Before we go off and start developing this app, I wanted to check if anyone in the CF community has thought about this approach before and have a solution in place or any ideas to consider.
Thanks, Siva Balan
|
|
Hi Daniel, Thanks for your response. I am aware of all the options you are suggesting. But what we are looking for is a process to restart an app instance without human intervention from an alert policy in our monitoring system. This monitoring system is outside of CF and does not have access to CF CLI. But it can access REST endpoints.
For eg - The monitoring system will detect a high CPU utilization on one of the app instance. It will raise an alert which will trigger a policy that will call a REST endpoint of this self healing app. Based on the parameters passed in the request, the self-healing app will restart the requested app instance.
This is required when the app does not know that it is in a bad state but some metrics we are tracking are indicating that the app instance need to be restarted.
Hope that makes sense.
Thanks Siva
toggle quoted message
Show quoted text
On Fri, Jan 24, 2020 at 9:55 AM Daniel Mikusa < dmikusa@...> wrote: Not sure I totally get what you are asking, but `cf restart-app-instance` will restart an instance, so if you have an alert trigger a script, you could script the restart.
Or you could just have the app itself know when it gets into a bad state, presumably it would if it's emitting the metrics to indicate this, and exit. When it exits the platform will just restart the app.
Dan
Dear CF community, We are trying to find a way to selectively restart some instances of apps or to restart a specific app on an as needed basis based on some alerts that we receive from our monitoring solution. One option we are considering is to have a self-healing app deployed in CF which will have some REST endpoints exposed which we can call from our alert policies that will perform those actions for us. This self-healing app will essentially have the capabilities of CF CLI for stopping and starting services and instances. This app will also be protected by UAA.
Before we go off and start developing this app, I wanted to check if anyone in the CF community has thought about this approach before and have a solution in place or any ideas to consider.
Thanks, Siva Balan
|
|
Hi Siva,
I'm not aware of a similar solution that already exists. A couple of thoughts: - Could you use HTTP healthchecks, and have the endpoint return a non-200 status code if the app detects high CPU usage itself?
- Be mindful of how CPU usage is reported. Whilst current containerisation tech can limit how many CPU shares a process gets, it can't control the system calls that report how much CPU is available. Hence things like `top` will appear inaccurate, and you should ensure the CPU usage statistics come from the metrics that feed into the cpu-entitlement-plugin. If you want to double-check this, there's a blog post (https://www.cloudfoundry.org/blog/better-way-split-cake-cpu-entitlements/) and the folks in the #garden channel are awfully helpful.
- Having an endpoint that allows remote termination of an app sounds like a bit of a security risk, but I'm sure you'll manage that appropriately.
Regards, Daniel 'Deejay' Jones - CTO +44 (0)79 8000 9153
toggle quoted message
Show quoted text
Hi Daniel, Thanks for your response. I am aware of all the options you are suggesting. But what we are looking for is a process to restart an app instance without human intervention from an alert policy in our monitoring system. This monitoring system is outside of CF and does not have access to CF CLI. But it can access REST endpoints.
For eg - The monitoring system will detect a high CPU utilization on one of the app instance. It will raise an alert which will trigger a policy that will call a REST endpoint of this self healing app. Based on the parameters passed in the request, the self-healing app will restart the requested app instance.
This is required when the app does not know that it is in a bad state but some metrics we are tracking are indicating that the app instance need to be restarted.
Hope that makes sense.
Thanks Siva
On Fri, Jan 24, 2020 at 9:55 AM Daniel Mikusa < dmikusa@...> wrote: Not sure I totally get what you are asking, but `cf restart-app-instance` will restart an instance, so if you have an alert trigger a script, you could script the restart.
Or you could just have the app itself know when it gets into a bad state, presumably it would if it's emitting the metrics to indicate this, and exit. When it exits the platform will just restart the app.
Dan
Dear CF community, We are trying to find a way to selectively restart some instances of apps or to restart a specific app on an as needed basis based on some alerts that we receive from our monitoring solution. One option we are considering is to have a self-healing app deployed in CF which will have some REST endpoints exposed which we can call from our alert policies that will perform those actions for us. This self-healing app will essentially have the capabilities of CF CLI for stopping and starting services and instances. This app will also be protected by UAA.
Before we go off and start developing this app, I wanted to check if anyone in the CF community has thought about this approach before and have a solution in place or any ideas to consider.
Thanks, Siva Balan
--
|
|
Hi Daniel, Thanks for your response. I am aware of all the options you are suggesting. But what we are looking for is a process to restart an app instance without human intervention from an alert policy in our monitoring system. This monitoring system is outside of CF and does not have access to CF CLI. But it can access REST endpoints.
The cf cli is just a glorified rest client. If you can access the cloud controller API for your foundation, you can do everything I mentioned w/out the cf cli & by using raw rest commands.
+1 to everything Daniel Jones said in his response.
Hope that helps!
Dan
For eg - The monitoring system will detect a high CPU utilization on one of the app instance. It will raise an alert which will trigger a policy that will call a REST endpoint of this self healing app. Based on the parameters passed in the request, the self-healing app will restart the requested app instance.
This is required when the app does not know that it is in a bad state but some metrics we are tracking are indicating that the app instance need to be restarted.
Hope that makes sense.
Thanks Siva
On Fri, Jan 24, 2020 at 9:55 AM Daniel Mikusa < dmikusa@...> wrote: Not sure I totally get what you are asking, but `cf restart-app-instance` will restart an instance, so if you have an alert trigger a script, you could script the restart.
Or you could just have the app itself know when it gets into a bad state, presumably it would if it's emitting the metrics to indicate this, and exit. When it exits the platform will just restart the app.
Dan
Dear CF community, We are trying to find a way to selectively restart some instances of apps or to restart a specific app on an as needed basis based on some alerts that we receive from our monitoring solution. One option we are considering is to have a self-healing app deployed in CF which will have some REST endpoints exposed which we can call from our alert policies that will perform those actions for us. This self-healing app will essentially have the capabilities of CF CLI for stopping and starting services and instances. This app will also be protected by UAA.
Before we go off and start developing this app, I wanted to check if anyone in the CF community has thought about this approach before and have a solution in place or any ideas to consider.
Thanks, Siva Balan
--
|
|

Troy Topnik
Ideally you'd want to trace the application misbehavior to a root cause in the application itself, but I think we've all been in the situation where "turn it off and on again" is an easier solution. :) I wonder if this could be a feature request for App-AutoScaler? It already has access to the metric types required for the operation, but it would need to be able to take a policy action based on those metrics other than scaling up or down (e.g. "adjustment" : "restart" ). TT --
Troy Topnik
Senior Product Manager,
SUSE Cloud Application Platform
troy.topnik@...
|
|
Thanks Daniel J and Daniel M for your inputs.
Troy - We are also thinking something along those lines to see of we can use the App Autoscaler for the restarts.
-Siva
toggle quoted message
Show quoted text
Ideally you'd want to trace the application misbehavior to a root cause in the application itself, but I think we've all been in the situation where "turn it off and on again" is an easier solution. :)
I wonder if this could be a feature request for App-AutoScaler? It already has access to the metric types required for the operation, but it would need to be able to take a policy action based on those metrics other than scaling up or down (e.g. "adjustment" : "restart" ).
TT
--
Troy Topnik
Senior Product Manager,
SUSE Cloud Application Platform
|
|
Hjortshoj, Julian <Julian.Hjortshoj@...>
To me this seems a lot like a health check. Is there some reason that you couldn't add a health check endpoint to your app instances (either directly, or as a sidecar) and then let CF take care of restarting the app instances for you?
toggle quoted message
Show quoted text
From: cf-dev@... <cf-dev@...> on behalf of Siva <mailsiva@...>
Sent: Monday, January 27, 2020 11:22 AM
To: Discussions about Cloud Foundry projects and the system overall. <cf-dev@...>
Subject: Re: [cf-dev] CF app that helps with self-healing
Thanks Daniel J and Daniel M for your inputs.
Troy - We are also thinking something along those lines to see of we can use the App Autoscaler for the restarts.
-Siva
Ideally you'd want to trace the application misbehavior to a root cause in the application itself, but I think we've all been in the situation where "turn it off and on again" is an easier solution. :)
I wonder if this could be a feature request for App-AutoScaler? It already has access to the metric types required for the operation, but it would need to be able to take a policy action based on those metrics other than scaling up or down (e.g. "adjustment"
: "restart" ).
TT
--
Troy Topnik
Senior Product Manager,
SUSE Cloud Application Platform
--
|
|