Re: Testing behaviour of a production CF environment


Daniel Jones
 

Hi Graham,

Running acceptance tests in production is absolutely what I'd recommend -
in fact I drove that point home in my talk in Santa Clara last week (I can
forward on the link once the YouTube videos are up).

I've worked with customers who didn't use the official CATS, but instead
favoured writing their own in the BDD framework of their choice. We didn't
find them too onerous to develop and maintain, and an example test would be:

1. Push fixture app
2. Start app
3. Hit app, validate response
4. Hit URL on app to write to a given data service
5. Hit URL to read written value, validate
6. Stop app
7. Delete app

This exercised some of the core user-facing behaviour, and also those of
data services (search for Pivotal's apps like cf-redis-example-app
<https://github.com/pivotal-cf/cf-redis-example-app> which follow the same
pattern). We had additional tests that would log a given unique string
through an app, and then hit the log aggregation system to validate that it
had made its way through. The tests were small, so we had more granular
control over the frequency of each test, and got faster feedback through
parallelisation.

Running these sorts of tests against each Cloud Foundry instance on a CI
server with a wallboard view worked really well. Not only do you get volume
testing for free (I've filled a buildpack cache that way
<http://www.engineerbetter.com/update/2015/08/19/overflowing-buildpack_cache.html>),
you can publish the wallboard URL to PaaS customers and stakeholders alike.
Tying these tests up to alerting/paging systems is also more sensible than
paging people due to IaaS-level failures.

It sounds like you're doing the right thing, and I'd encourage you to
continue and expand your efforts in that area. I'm happy to discuss more if
this is an area of interest for you.

Regards,
Daniel Jones - CTO
+44 (0)79 8000 9153
@DanielJonesEB <https://twitter.com/DanielJonesEB>
*EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry
Specialists

On Wed, Jun 1, 2016 at 4:06 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Graham,

Your approach sounds good. What you are doing/plan to do in CI sounds
perfect, as well as your plan for production (namely, run what you have in
CI except cf-acceptance-tests). In the README for cf-acceptance-tests, we
state:

These tests are not intended for use against production systems, they are
intended for acceptance environments for teams developing Cloud Foundry
itself. While these tests attempt to clean up after themselves, there is no
guarantee that they will not mutate state of your system in an undesirable
way.

I'd recommend if you're going to run critical workloads on production, you
should consider having a staging environment where you roll out a CF
upgrade before you roll it out to production.

We are actually already tracking an issue related to buildpacks not being
cleaned up:

https://www.pivotaltracker.com/story/show/115199031

But as you'll be able to see, it's not the highest priority at the moment.

The README attempts to give some idea of whether test suites are unsafe to
run in certain contexts:


https://github.com/cloudfoundry/cf-acceptance-tests#explanation-of-test-suites

And the section on Test Execution explains how you can skip test suites,
tests matching a certain regex, etc:

https://github.com/cloudfoundry/cf-acceptance-tests#test-execution

Best,
Amit

On Tue, May 31, 2016 at 10:15 AM, Graham Bleach <
graham.bleach(a)digital.cabinet-office.gov.uk> wrote:

Hello,

What do you use to test the behaviour of your production environments?

We are currently not live and are running these in each environment:
- cf-smoke-tests to test core functionality is working
- cf-acceptance-tests to test behaviour in more detail
- our own custom acceptance tests against code we've written, behaviour
we've configured and care about not breaking
- external monitoring against some deployed apps

We need to stop running cf-acceptance-tests in production, because they
sometimes cause problems if they exit prematurely and eg. leave an
unexpected buildpack as the first buildpack in the list. So we could run
those tests only in our CI environment every time we change something.

However we'd like to identify behaviour changes that aren't caused by our
changes and don't occur in our CI environment. For example, we recently
uncovered a problem with an infrastructure product that we only noticed by
running smoke-tests in production - that error didn't happen in other
environments. We're worried about the coverage we'd lose by not running the
tests.

One option that seems appealing to us is to try to work out a way of
running just the "safe" acceptance tests. For our purposes, "safe" tests
probably means ones that don't need to run as admin, that could run in
their own org - the isolation features of CF probably protect us enough
against impact to people using CF.

But it doesn't seem obvious how to currently run such a subset of the
acceptance-tests and do so in a way that's likely to be stable in the
future, so I asked this question.

Graham

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.