We run lots (TK: number) of tests every time an engineer at Even makes a pull request. At first, running all of those tests was slow and required a bunch of complicated support code. But since I joined in October 2018, we totally revamped the way we run tests. This post tells the story of how we made testing faster and easier on our backend services. If you're interested in learning more, or want to help us improve testing our React Native mobile apps, we're hiring.
Testing at Even is as simple as:
bazel test //...
But that is not where we started and that command is deceptively complex.
Our backend code is largely made up of the following languages and components:
- Golang
- Python
- Postgres
- AWS
- Docker & docker-compose
This is a simplified list of the most complex parts of our system.
We have multiple services that use gRPC for communication. To mimic our production system, we use docker-compose to stitch together containers and support services like Postgres during local test and development.
Our "unit tests" could be considered integration tests as most of them test code end-to-end (e.g. reading/writing to postgres). Furthermore, we rarely mock out any interaction with external services including AWS. When I started at Even, some of our tests relied on things like S3 buckets in our production account!
Developers (and our CI system) ran tests using a script that basically called
docker-compose run <service>.test
. That test service was a container defined in
docker-compose.yml
that existed for each component in our backend. Each service had to
duplicate the same basic setup:
- Link in support services (databases, caches, blob stores, etc)
- Migrate the database
- Run
go test <packages>
ortox
(for Python)
This seems simple enough but came with a lot of problems and hacks:
- All services depended on the same single database. We had to carefully make sure the database names were different for each service being tested.
- Our go tests were split into serial and parallel tests. Golang has support for marking tests as parallel but it will still test packages in parallel. We had to write our own test runner to sequentially execute packages with serial tests.
- There was no way to run all tests in our repository. Most developers would only test the packages they were working on and would rely on CI to test everything else.
- Caching wasn't really an option, especially in CI.
In addition to the testing issues listed above, we had no way to guarantee consistent build tools between developers and CI.
bazel is a fantastic build tool for monorepos. This post isn't really about why we chose bazel or what bazel can do. We wanted reproducible builds and tighter control over our tooling so bazel was a good fit.
The first step towards bazel test //...
was to get bazel build //...
working. Luckily, there is a great tool called
gazelle that integrates with
bazel to produce BUILD.bazel
files for go projects. We heavily rely on this to
update BUILD.bazel
automatically.
Since we wanted tight control over tooling, bazel
is actually a script in our
repository that ensures that every developer and our CI system gets
the exact same version of the "real" bazel. The script automatically installs the
desired version and handles some other complex setup (more on that later).
bazel test //...
almost worked out of the box. We ended up hitting the
serial/parallel issue mentioned above. bazel
wants to run as many actions as
possible in parallel. We could break up the test runs into a parallel version
and then a serial version but that breaks bazel
's cache because it changes the
nature of the test run (whether we used ENV
vars or flags).
In order to have bazel
test everything in parallel, we needed to remove the
shared resource of postgres. Our serial tests were marked that way because they
a) truncated a table or b) relied on a pre-existing database state. But since
we couldn't give every service a dedicated database container, we decided to
use postgres templates to isolate each test within the same database.
CREATE DATABASE foo WITH TEMPLATE bar
is a postgres command that will create a
database named foo
that is identical to the database bar
. Our migration step
runs all migrations on our core databases and then makes a copy into a database
named <database>_template
. Each package that interacts with the database
automatically runs
CREATE DATABASE source_my_package WITH TEMPLATE source_template
and reconnects
to the new database. This isolates the package to a copy of the database it
needs so that it does not conflict with other tests.
Our tests interact with other services that need to talk to the database as
well. Our gRPC
layer will forward the new database name to remote services
(running under docker) so the remote service will use the same copy.
In the old system, all developers needed to do was run a single command
run-tests foo
that took care of docker-compose
, database migrations, and
testing the targeted package. bazel
is not a tool for managing long running
applications like docker. I mentioned earlier in the post that our bazel
is a
script that wraps the real bazel
binary. This script does a lot of magic in
order to ease the transition for developers to a new command:
docker-compose
setup.- Database migrations.
- Setting the environment for bazel via
--test_env
arguments. - Running
gazelle
. - Convenience flags.
- Package matching similar to golang (e.g.
./...
->//...
).
When a developer runs bazel test //...
, they will first see a run of gazelle
to automatically update BUILD.bazel
files. Next, bazel
will run a dedicated
docker-compose
service that ensures all dependencies are up and available.
Then, bazel
runs all migrations. Finally, bazel
tests the desired packages.
We added flags to bazel
to control all the above behaviors in order to iterate
much faster. We now have a command bazel watch
which runs ibazel
under the
hood to automatically test the desired packages when changes are made. Using
bazel watch
is very similar to automatic build+test in an IDE.
I mentioned python earlier as one of our development languages. bazel
's python
support is not as advanced as other languages but we still wanted
bazel test //...
to work with our python code.
sh_test
is a bazel
rule that runs a shell script as a test target. We do not
test our python code directly with bazel
. Instead, we continue to use the
docker
container for running tox
. Our sh_test
includes all dependencies
(Dockerfile
, docker-compose.yml
, etc) and python sources as the data
argument. The script invokes docker-compose
to run tox
within the container.
The nice part about this is that we have tightened up the dependency set so that
these tests are easy to cache.
The last big component of our testing infrastructure is localstack. We use localstack to mock out AWS services required by tests. Localstack does a great job of duplicating the behavior of most AWS services. This allows us to write tests that utilize the same code path as production - we just point the test to our localstack instance.
At this point we are heavily invested in bazel
. We use it for linting,
docker
build & push, code generation, and infrastructure. We have made
incredible progress but we have a lot more to do. We still have several docker
images that exist outside of bazel
and our python code is not properly
integrated. Our client codebase has just gotten started with bazel
and has an
entirely unique set of challenges. We hope to apply the same set of improvements
to the client codebase.