New CF Service Broker "chaos-galago" - a chaos-monkey for your Cloud Foundry


Sam Bryant
 

Hi all,

Fidelity International have recently developed and open-sourced a new Service Broker for Cloud Foundry called "chaos-galago". It has been created with aim of providing a marketplace service to users that will cause chaos to their bound applications. It works by randomly killing service instances based on a provided probability and frequency.

We see a massive benefit in chaos-galago as it should allow developers to test if their CF applications truly are cloud-resilient, do they cope with unexpected failures? Are they deployed with enough instances so that they meantime uptime?

More information and how to deploy it can be found on our Github repo: https://github.com/FidelityInternational/chaos-galago

Any input, feedback and questions are encouraged, we hope that everyone finds it the same useful tool that we do.

Regards,
Sam


Zach Brown
 

Nice!

On Thu, Mar 3, 2016 at 8:33 AM, Sam Bryant <srbry(a)hotmail.com> wrote:

Hi all,

Fidelity International have recently developed and open-sourced a new
Service Broker for Cloud Foundry called "chaos-galago". It has been created
with aim of providing a marketplace service to users that will cause chaos
to their bound applications. It works by randomly killing service instances
based on a provided probability and frequency.

We see a massive benefit in chaos-galago as it should allow developers to
test if their CF applications truly are cloud-resilient, do they cope with
unexpected failures? Are they deployed with enough instances so that they
meantime uptime?

More information and how to deploy it can be found on our Github repo:
https://github.com/FidelityInternational/chaos-galago

Any input, feedback and questions are encouraged, we hope that everyone
finds it the same useful tool that we do.

Regards,
Sam
--

*Zach Brown* | Product Manager

650-954-0427 - mobile

zbrown(a)pivotal.io

<http://pivotal.io>


Sam Bryant
 

For anyone interested we have also now added a smoke tests project for chaos-galago that can be used to monitor the service-broker. This can be found: https://github.com/FidelityInternational/chaos-galago-smoke-tests

Details are also on the README for chaos-galago.

Regards,
Sam


Cornelia Davis <cdavis@...>
 

wicked cool!

On Mon, Mar 7, 2016 at 4:57 AM, Sam Bryant <srbry(a)hotmail.com> wrote:

For anyone interested we have also now added a smoke tests project for
chaos-galago that can be used to monitor the service-broker. This can be
found: https://github.com/FidelityInternational/chaos-galago-smoke-tests

Details are also on the README for chaos-galago.

Regards,
Sam
--
Cornelia Davis
(805) 452 8941


David Illsley <davidillsley@...>
 

Really cool. How nasty is the kill? Is the process killed, then CF
cleans-up on healthcheck failure, or is the app instance removed from the
router before instance termination?

On Thu, Mar 3, 2016 at 4:33 PM, Sam Bryant <srbry(a)hotmail.com> wrote:

Hi all,

Fidelity International have recently developed and open-sourced a new
Service Broker for Cloud Foundry called "chaos-galago". It has been created
with aim of providing a marketplace service to users that will cause chaos
to their bound applications. It works by randomly killing service instances
based on a provided probability and frequency.

We see a massive benefit in chaos-galago as it should allow developers to
test if their CF applications truly are cloud-resilient, do they cope with
unexpected failures? Are they deployed with enough instances so that they
meantime uptime?

More information and how to deploy it can be found on our Github repo:
https://github.com/FidelityInternational/chaos-galago

Any input, feedback and questions are encouraged, we hope that everyone
finds it the same useful tool that we do.

Regards,
Sam


Sam Bryant
 

Hi David,

The service broker actually uses an already existing API endpoint: https://apidocs.cloudfoundry.org/231/apps/terminate_the_running_app_instance_at_the_given_index.html

It will terminate the app instance, in my experience CF always tidies up and restarts the app within a minute or so.

If you have any other ideas/ opinions on how application instances should be terminated I am happy to investigate the options of enhancements. We are also open to pull requests.

Regards,
Sam