Testing @ Even

We run lots (TK: number) of tests every time an engineer at Even makes a pull request. At first, running all of those tests was slow and required a bunch of complicated support code. But since I joined in October 2018, we totally revamped the way we run tests. This post tells the story of how we made testing faster and easier on our backend services. If you're interested in learning more, or want to help us improve testing our React Native mobile apps, we're hiring.

TL;DR;

Testing at Even is as simple as:

bazel test //...

But that is not where we started and that command is deceptively complex.

Where we started

Our backend code is largely made up of the following languages and components:

Golang
Python
Postgres
AWS
Docker & docker-compose

This is a simplified list of the most complex parts of our system.

We have multiple services that use gRPC for communication. To mimic our production system, we use docker-compose to stitch together containers and support services like Postgres during local test and development.

Our "unit tests" could be considered integration tests as most of them test code end-to-end (e.g. reading/writing to postgres). Furthermore, we rarely mock out any interaction with external services including AWS. When I started at Even, some of our tests relied on things like S3 buckets in our production account!

Developers (and our CI system) ran tests using a script that basically called docker-compose run <service>.test. That test service was a container defined in docker-compose.yml that existed for each component in our backend. Each service had to duplicate the same basic setup:

Link in support services (databases, caches, blob stores, etc)
Migrate the database
Run go test <packages> or tox (for Python)

This seems simple enough but came with a lot of problems and hacks:

All services depended on the same single database. We had to carefully make sure the database names were different for each service being tested.
Our go tests were split into serial and parallel tests. Golang has support for marking tests as parallel but it will still test packages in parallel. We had to write our own test runner to sequentially execute packages with serial tests.
There was no way to run all tests in our repository. Most developers would only test the packages they were working on and would rely on CI to test everything else.
Caching wasn't really an option, especially in CI.

In addition to the testing issues listed above, we had no way to guarantee consistent build tools between developers and CI.

Introducing bazel

bazel is a fantastic build tool for monorepos. This post isn't really about why we chose bazel or what bazel can do. We wanted reproducible builds and tighter control over our tooling so bazel was a good fit.

The first step towards bazel test //... was to get bazel build //... working. Luckily, there is a great tool called gazelle that integrates with bazel to produce BUILD.bazel files for go projects. We heavily rely on this to update BUILD.bazel automatically.

Since we wanted tight control over tooling, bazel is actually a script in our repository that ensures that every developer and our CI system gets the exact same version of the "real" bazel. The script automatically installs the desired version and handles some other complex setup (more on that later).

bazel test //... almost worked out of the box. We ended up hitting the serial/parallel issue mentioned above. bazel wants to run as many actions as possible in parallel. We could break up the test runs into a parallel version and then a serial version but that breaks bazel's cache because it changes the nature of the test run (whether we used ENV vars or flags).

Postgres templates

In order to have bazel test everything in parallel, we needed to remove the shared resource of postgres. Our serial tests were marked that way because they a) truncated a table or b) relied on a pre-existing database state. But since we couldn't give every service a dedicated database container, we decided to use postgres templates to isolate each test within the same database.

CREATE DATABASE foo WITH TEMPLATE bar is a postgres command that will create a database named foo that is identical to the database bar. Our migration step runs all migrations on our core databases and then makes a copy into a database named <database>_template. Each package that interacts with the database automatically runs CREATE DATABASE source_my_package WITH TEMPLATE source_template and reconnects to the new database. This isolates the package to a copy of the database it needs so that it does not conflict with other tests.

Our tests interact with other services that need to talk to the database as well. Our gRPC layer will forward the new database name to remote services (running under docker) so the remote service will use the same copy.

Test setup

In the old system, all developers needed to do was run a single command run-tests foo that took care of docker-compose, database migrations, and testing the targeted package. bazel is not a tool for managing long running applications like docker. I mentioned earlier in the post that our bazel is a script that wraps the real bazel binary. This script does a lot of magic in order to ease the transition for developers to a new command:

docker-compose setup.
Database migrations.
Setting the environment for bazel via --test_env arguments.
Running gazelle.
Convenience flags.
Package matching similar to golang (e.g. ./... -> //...).

When a developer runs bazel test //..., they will first see a run of gazelle to automatically update BUILD.bazel files. Next, bazel will run a dedicated docker-compose service that ensures all dependencies are up and available. Then, bazel runs all migrations. Finally, bazel tests the desired packages.

We added flags to bazel to control all the above behaviors in order to iterate much faster. We now have a command bazel watch which runs ibazel under the hood to automatically test the desired packages when changes are made. Using bazel watch is very similar to automatic build+test in an IDE.

Python

I mentioned python earlier as one of our development languages. bazel's python support is not as advanced as other languages but we still wanted bazel test //... to work with our python code.

sh_test is a bazel rule that runs a shell script as a test target. We do not test our python code directly with bazel. Instead, we continue to use the docker container for running tox. Our sh_test includes all dependencies (Dockerfile, docker-compose.yml, etc) and python sources as the data argument. The script invokes docker-compose to run tox within the container. The nice part about this is that we have tightened up the dependency set so that these tests are easy to cache.

Localstack

The last big component of our testing infrastructure is localstack. We use localstack to mock out AWS services required by tests. Localstack does a great job of duplicating the behavior of most AWS services. This allows us to write tests that utilize the same code path as production - we just point the test to our localstack instance.

The future

At this point we are heavily invested in bazel. We use it for linting, docker build & push, code generation, and infrastructure. We have made incredible progress but we have a lot more to do. We still have several docker images that exist outside of bazel and our python code is not properly integrated. Our client codebase has just gotten started with bazel and has an entirely unique set of challenges. We hope to apply the same set of improvements to the client codebase.

whilp/blog.md