Re: CF app that helps with self-healing


Daniel Jones
 

Hi Siva,

I'm not aware of a similar solution that already exists. A couple of thoughts:
  • Could you use HTTP healthchecks, and have the endpoint return a non-200 status code if the app detects high CPU usage itself?
  • Be mindful of how CPU usage is reported. Whilst current containerisation tech can limit how many CPU shares a process gets, it can't control the system calls that report how much CPU is available. Hence things like `top` will appear inaccurate, and you should ensure the CPU usage statistics come from the metrics that feed into the cpu-entitlement-plugin. If you want to double-check this, there's a blog post (https://www.cloudfoundry.org/blog/better-way-split-cake-cpu-entitlements/) and the folks in the #garden channel are awfully helpful.
  • Having an endpoint that allows remote termination of an app sounds like a bit of a security risk, but I'm sure you'll manage that appropriately.

Regards,
Daniel 'Deejay' Jones - CTO
+44 (0)79 8000 9153
EngineerBetter Ltd - More than cloud platform specialists


On Fri, 24 Jan 2020 at 22:27, Siva <mailsiva@...> wrote:
Hi Daniel,
Thanks for your response.
I am aware of all the options you are suggesting. But what we are looking for is a process to restart an app instance without human intervention from an alert policy in our monitoring system. This monitoring system is outside of CF and does not have access to CF CLI. But it can access REST endpoints.
For eg - The monitoring system will detect a high CPU utilization on one of the app instance. It will raise an alert which will trigger a policy that will call a REST endpoint of this self healing app. Based on the parameters passed in the request, the self-healing app will restart the requested app instance.
This is required when the app does not know that it is in a bad state but some metrics we are tracking are indicating that the app instance need to be restarted.
Hope that makes sense.

Thanks
Siva

On Fri, Jan 24, 2020 at 9:55 AM Daniel Mikusa <dmikusa@...> wrote:
Not sure I totally get what you are asking, but `cf restart-app-instance` will restart an instance, so if you have an alert trigger a script, you could script the restart.

Or you could just have the app itself know when it gets into a bad state, presumably it would if it's emitting the metrics to indicate this, and exit. When it exits the platform will just restart the app.

Dan


On Fri, Jan 24, 2020 at 12:30 PM Siva <mailsiva@...> wrote:
Dear CF community,
We are trying to find a way to selectively restart some instances of apps or to restart a specific app on an as needed basis based on some alerts that we receive from our monitoring solution. One option we are considering is to have a self-healing app deployed in CF which will have some REST endpoints exposed which we can call from our alert policies that will perform those actions for us. This self-healing app will essentially have the capabilities of CF CLI for stopping and starting services and instances. This app will also be protected by UAA.
Before we go off and start developing this app, I wanted to check if anyone in the CF community has thought about this approach before and have a solution in place or any ideas to consider.

Thanks,
Siva Balan



--

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.