Date
1 - 6 of 6
Testing behaviour of a production CF environment
Graham Bleach
Hello,
What do you use to test the behaviour of your production environments? We are currently not live and are running these in each environment: - cf-smoke-tests to test core functionality is working - cf-acceptance-tests to test behaviour in more detail - our own custom acceptance tests against code we've written, behaviour we've configured and care about not breaking - external monitoring against some deployed apps We need to stop running cf-acceptance-tests in production, because they sometimes cause problems if they exit prematurely and eg. leave an unexpected buildpack as the first buildpack in the list. So we could run those tests only in our CI environment every time we change something. However we'd like to identify behaviour changes that aren't caused by our changes and don't occur in our CI environment. For example, we recently uncovered a problem with an infrastructure product that we only noticed by running smoke-tests in production - that error didn't happen in other environments. We're worried about the coverage we'd lose by not running the tests. One option that seems appealing to us is to try to work out a way of running just the "safe" acceptance tests. For our purposes, "safe" tests probably means ones that don't need to run as admin, that could run in their own org - the isolation features of CF probably protect us enough against impact to people using CF. But it doesn't seem obvious how to currently run such a subset of the acceptance-tests and do so in a way that's likely to be stable in the future, so I asked this question. Graham |
|
Amit Kumar Gupta
Hi Graham,
Your approach sounds good. What you are doing/plan to do in CI sounds perfect, as well as your plan for production (namely, run what you have in CI except cf-acceptance-tests). In the README for cf-acceptance-tests, we state: These tests are not intended for use against production systems, they are intended for acceptance environments for teams developing Cloud Foundry I'd recommend if you're going to run critical workloads on production, you should consider having a staging environment where you roll out a CF upgrade before you roll it out to production. We are actually already tracking an issue related to buildpacks not being cleaned up: https://www.pivotaltracker.com/story/show/115199031 But as you'll be able to see, it's not the highest priority at the moment. The README attempts to give some idea of whether test suites are unsafe to run in certain contexts: https://github.com/cloudfoundry/cf-acceptance-tests#explanation-of-test-suites And the section on Test Execution explains how you can skip test suites, tests matching a certain regex, etc: https://github.com/cloudfoundry/cf-acceptance-tests#test-execution Best, Amit On Tue, May 31, 2016 at 10:15 AM, Graham Bleach < graham.bleach(a)digital.cabinet-office.gov.uk> wrote: Hello, |
|
Daniel Jones
Hi Graham,
toggle quoted message
Show quoted text
Running acceptance tests in production is absolutely what I'd recommend - in fact I drove that point home in my talk in Santa Clara last week (I can forward on the link once the YouTube videos are up). I've worked with customers who didn't use the official CATS, but instead favoured writing their own in the BDD framework of their choice. We didn't find them too onerous to develop and maintain, and an example test would be: 1. Push fixture app 2. Start app 3. Hit app, validate response 4. Hit URL on app to write to a given data service 5. Hit URL to read written value, validate 6. Stop app 7. Delete app This exercised some of the core user-facing behaviour, and also those of data services (search for Pivotal's apps like cf-redis-example-app <https://github.com/pivotal-cf/cf-redis-example-app> which follow the same pattern). We had additional tests that would log a given unique string through an app, and then hit the log aggregation system to validate that it had made its way through. The tests were small, so we had more granular control over the frequency of each test, and got faster feedback through parallelisation. Running these sorts of tests against each Cloud Foundry instance on a CI server with a wallboard view worked really well. Not only do you get volume testing for free (I've filled a buildpack cache that way <http://www.engineerbetter.com/update/2015/08/19/overflowing-buildpack_cache.html>), you can publish the wallboard URL to PaaS customers and stakeholders alike. Tying these tests up to alerting/paging systems is also more sensible than paging people due to IaaS-level failures. It sounds like you're doing the right thing, and I'd encourage you to continue and expand your efforts in that area. I'm happy to discuss more if this is an area of interest for you. Regards, Daniel Jones - CTO +44 (0)79 8000 9153 @DanielJonesEB <https://twitter.com/DanielJonesEB> *EngineerBetter* Ltd <http://www.engineerbetter.com> - UK Cloud Foundry Specialists On Wed, Jun 1, 2016 at 4:06 AM, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Graham, |
|
Graham Bleach
Hi Amit,
Thanks for your reply. On 1 June 2016 at 04:06, Amit Gupta <agupta(a)pivotal.io> wrote: The README attempts to give some idea of whether test suites are unsafe toWe've been looking through the test suites and can't see a straightforward way to run only the "safe" tests that won't affect normal users / don't require an admin user. For instance, the apps suite includes both tests we'd like to run, testing core user-facing behaviour and the admin buildpack lifecycle test with the issue you linked to. I don't think skipping based on regexes on test names works well, both because the regex will become long quite quickly and because cf-acceptance-tests is a moving target - each time we upgraded to a new release we'd need to review which tests were added and update our regexes. I wondered if other people would be interested in having a way to only run the "non-admin" tests? If so, perhaps re-organising the suites to enable that would be a welcome change? Regards, Graham |
|
Graham Bleach
On 1 June 2016 at 09:22, Daniel Jones <daniel.jones(a)engineerbetter.com>
wrote: Running acceptance tests in production is absolutely what I'd recommend -Sounds very relevant, I'll look forward to the video. I've worked with customers who didn't use the official CATS, but insteadWe have added tests for things we've built / configured, we borrowed a fair amount in style from CATS: https://github.com/alphagov/paas-cf/tree/master/tests/src/acceptance In principle I think the conversations / decisions about which behaviour should be tested is valuable, as is having tests written in a language / framework that's understood by the team, so I can understand why people would do this. I don't think this works for us for things that are already tested in CATS though, as it feels like duplication of effort, both to write and maintain the tests, which is why I'm interested in the idea of moving tests around within CATS to enable people to run a subset of tests that we consider to be production-safe. Graham |
|
Amit Kumar Gupta
Hi Graham,
Something like that would be nice. However, technically, every test needs admin because in their Before hooks they use the admin to create an org, quota, etc. Sounds like what you want is to run tests that don't require "admin" *in a way that might affect other users*. Even if we could split up the tests right now along those lines, it will be a hard thing to enforce moving forward since CATS is such a moving target, and touched by so many different teams. In principle, I'd like CATS to be able to guarantee it will clean up after itself, do things with minimal affect on other users, and even be something that could be safe to run in a production environment. And for such a thing, I'd like to have some automated way to ensure CATS adheres to this contract. Right now nothing like this exists, so we don't make the guarantee that it's safe to run against prod. That said, it would be nice to have that option in the future, so I'd like to keep CATS as self-contained as possible right now. So for now, the best thing we can do is identify tests, setup, or patterns that definitely would be bad for a prod environment and fix them. Have you been able to identify a complete list of problematic tests? We can tackle them one-by-one and try to figure out which projects need issues/PRs opened against them. E.g. I think for the buildpacks issue, it might be desirable to limit the scope of a buildpack to a space or org, though it would require some thought to figure out how to make sense of buildpack priority. Best, Amit On Wed, Jun 1, 2016 at 3:44 AM, Graham Bleach < graham.bleach(a)digital.cabinet-office.gov.uk> wrote: Hi Amit, |
|