Skip to content

Instantly share code, notes, and snippets.

@hashbrowncipher
Last active February 21, 2024 06:20
Show Gist options
  • Save hashbrowncipher/94f781dedd804c1b8198153c61fc6aea to your computer and use it in GitHub Desktop.
Save hashbrowncipher/94f781dedd804c1b8198153c61fc6aea to your computer and use it in GitHub Desktop.
The single version rule is good for Google and bad for you

I had a bad brush with the single version rule at work, and I’d like to warn you so that you can avoid the pain it caused me.

The best thing about monorepos

The best thing about monorepos is that they require the author of any given change to make the entire repository’s tests pass before committing that change. This puts the onus of managing incompatible changes on the original author, which:

  1. Centralizes the responsibility of solving incompatibilities onto the person or team with the most knowledge of the change.
  2. (In the best case) exposes any incorrect assumptions they may have about their change early-on in the process.

Largely as a result of the above property, monorepos obviate most of the overhead around dependency versioning. When dependencies are consumed at HEAD, it is no longer necessary to maintain separate artifact versioning, and the build and release pipelines for intermediate artifacts (e.g. base images, libraries) melt away.

Practically speaking, monorepos also facilitate the Beyoncé rule ("if you liked it, then you shoulda put a test on it"), by making it easy for teams to build tests around the invariants currently provided by other teams.

Why are third party dependencies different from first party dependencies?

After a third party dependency is brought into a monorepo, the upstream maintainer of that software will (one hopes) continue to ship changes to their project. Some of those changes will require additional work to integrate into your monorepo.

Once the upstream maintainer releases a new version of their code, the individual build targets within a monorepo come in conflict with one-another. There are the eager beavers who are champing at the bit to consume the new dependency version, or for whom the effort of upgrading is minor. And then there are the slowpokes, who want to take additional time to either a) fix known incompatibilities or b) validate the upstream change against their critical functionality. A compromise must be made between the two camps, and a good compromise makes everyone unhappy.

Doesn’t Google use the single-version rule?

Google is known to use the single-version rule, but it is questionable whether the rule as implemented at Google has much relevance to most software engineering workplaces. In particular, Google has three things that most of my readers do not:

  1. an ecosystem so large and encompassing that most tasks performed within it never have to touch a third party dependency.
  2. a VCS that is designed for truly staggering quantities of code, which makes it possible to vendor-in a whole 3rd party project in source-code form.
  3. teams of people whose sole job it is to keep dependencies up to date

I imagine that if I were in Google’s shoes, the single-version rule would start to look pretty sensible.

Software projects versus whole firms

Most monorepos are intended to serve company-wide needs, and therefore have a company-wide audience. The problem with this is that companies are big places, and they have legal/business reasons for being that way. The legal/business reasons don’t always align with the engineering realities, and as a result you end up with many different needs attempting to live under the same roof. On the face of it, this is a good thing: a single platform team can achieve very high leverage by serving everyone in the company well. But if the single platform team isn’t serving everyone well, then the platform becomes more of a prison.

To illustrate, here are the builds that generally coexist in any given monorepo:

  • Code developed by: internal software developers, IT, and external contractors or vendors.
  • Services deployed to: the public Internet, Intranet-facing services, internal batch workloads, and incubating/hackathon projects.
  • Risks that are: directly impactful to the firm’s brand, or so non-critical that iteration speed matters far more.

A strict single version rule attempts to solve all of those use-cases simultaneously, which becomes impossible for any mid-size or larger organization.

Safe upgrades

In the strict form of a single version rule, all upgrades are “big bang”. A person proposing (e.g.) a Go SDK upgrade creates a branch that builds all go binaries with the new SDK, gets it green, and merges their changes into master. Provided perfect in-repository test coverage, this is a safe operation. But perfect in-repository test coverage is unlikely to exist in most organizations. Most teams are at least somewhat reliant on building artifacts and deploying them into staging or loadtest environments in order to validate them. As a result, the process is more along the lines of:

  1. Make a feature branch
  2. Get it green
  3. Resolve any merge conflicts, repeating step 2 if necessary
  4. Announce (or mandate) that nobody should deploy, and land the change.
  5. Allow your build system to produce new deployable artifacts
  6. Invite a few low-criticality builds to deploy
  7. Revert the landed change
  8. Analyze results from the builds that got deployed
  9. Rinse and repeat with a larger audience each time.

This is a high-touch routine that clearly violates the “master should always be deployable” rule. Moreover, it represents a best-case scenario. In a different scenario, it becomes necessary to roll back (e.g.) a new Go SDK to an older version, because a critical service encountered a latent issue in production with the new version. To give an example, one of Service A’s monthly batches got 100x slower because of new GC behavior in the SDK. But, in the meantime, a different critical service unknowingly became dependent on the same behavior. In this case, the two services are in conflict with one another because their build system cannot express “Service A uses Golang 1.20 while Service B uses Golang 1.21.”

One line patches

One useful property of a build system is its ability to build a forked version of some upstream dependency with a small change. Even if you prefer to avoid forking upstream projects as a rule, the exceptions to that rule can be the difference between “Oh, we can’t ship that P1 feature until the MyRocks project merges our one-line fix”, and “That feature went to prod yesterday.”

The problem with this is that the MyRocks project has its own opinions about dependency versioning that might not line up with your team’s choices. Speaking from experience, the single version rule makes it more-or-less impossible to integrate any non-trivial open-source project into your team’s monorepo. Under a single version rule, your monorepo is not fully expressive: it lacks the ability to express any given build that is valuable to the business.

Onboarding to a single-version rule

When a company attempts a migration to a tool like Bazel while simultaneously enforcing a single-version rule, the process of harmonizing conflicting requirements quickly becomes the chief blocker inhibiting use of the build tool at all. With a fully-expressive build ecosystem, it is possible to construct a “fast path” for onboarding that doesn’t require any changes of the software-to-be-built. In my experience, imposing a single-version rule heightens the risk of an org-wide migration to a new build tool stalling, or failing entirely.

Security concerns

One oft-touted benefit of the single version rule is that it is possible to deploy a security upgrade to all users of a given third party dependency at once. But the converse is far worse: it is possible for a single blocker to prevent anyone in the monorepo from upgrading. This is essentially a priority-inversion in the monorepo: less security-critical projects can prevent upgrades by more security-critical projects.

As a practicing security engineer, my preference is to be able to identify specific projects, establish an SLA on security upgrades for their builds, and ship upgrades to those critical builds without potentially being blocked by any other workload in the repo. In other words, for each service I am responsible for, I want my repository and build tool to offer me a workflow that is no slower or less reliable than if that service lived by itself in its own repo.

But I think all of this skates past the truly scary risk for a security engineer, which arises from bootleg artifacts. Bootleg artifacts get built on someone’s laptop and uploaded to an artifacts store, because the fully sanctioned build pipeline wasn’t expressive enough to build the artifact “the right way”. From a security perspective, this presents the fun-to-worry-about tail risk that the build might be backdoored, either intentionally by the builder or through some compromise of their laptop. But it also presents the more quotidian risk that, when it comes time to rebuild a given load-bearing artifact, nobody on the team will be able to coax their laptop into exactly the right configuration that produced a working build last time.

Diamond dependencies

One issue with monorepo-optimized build systems like Bazel is that they don’t conceive well of two different artifacts that carry the same symbols. You can totally ship an artifact called //third-party/python/requests-2.30 and another one called //third-party/python/requests-2.27, but Bazel won’t know how to deduplicate them when a single py_binary has //library1 depending on the older version and //library2 depending on the newer version. In most cases, the resultant build will receive an arbitrarily chosen library version. The solution to this problem is to follow the same practice that package maintainers do:

  • If you’re maintaining a library, require your dependencies without specifying a given version.
  • If you’re maintaining a binary (or a test), resolve your requirements and your libraries’ requirements to exact version numbers.

In practice, this means that intermediate rules (py_library in Bazel) should carry metadata about dependencies that are required but unfulfilled. When the library is realized at a terminal rule (e.g. py_binary or py_test), Bazel consults a long list of fully-pinned third party dependencies and installs the subset of those which are marked as required by the py_librarys in its dependency tree. rules_py now contains an implementation of exactly this, through the virtual_deps parameter on py_library and the resolutions parameter of py_binary.

This approach also facilitates running the same tests against different sets of pinned requirements. When py_binary A depends on py_library B, which is tested by py_test C, the same tests in C can be exported by and re-run using A’s chosen dependency resolutions. This will verify that the library will perform adequately when installed and used by A.

Hybrid options

My favorite hybrid option is to allow each team or org within a monorepo to choose the dependencies that are right for them. Whole sub-trees of the repo can enforce a single-version rule, because they have enough homogeneity (e.g., of dependencies, business requirements, security needs, correctness needs) to warrant it. The repository as a whole can also have a default set of versions that a build receives unless it decides to provide its own.

Automation

No matter what system is picked, it makes sense for a central team to take responsibility for sending out periodic automated PRs to regenerate every lockfile in the monorepo, on a one-PR-per-lockfile basis.

Isn’t a single version rule just simpler?

When evaluating simplicity, it is useful to consider whether complexity is being reduced, or shifted from team to team. Both approaches are legitimate. In general, we’d prefer to (1) reduce complexity wherever we can, and failing that (2) shift it away from application teams and towards platform teams.

Is having a single specification for any given third party dependency simpler? I will admit that yes, it is immensely satisfying to be able to upgrade every build in your ecosystem with a single-line code change. But in most cases, the code change is the easiest and cheapest part of the software development lifecycle. When we tally the costs involved in the software development lifecycle, the mechanical aspect of changing bytes in a text file hardly matters, and probably isn't even being performed by a human being. The much greater costs come from the overhead of testing, code review, incrementally deploying code, monitoring fleet-wide deployments, detecting bugs and debugging them, rolling back when deficiencies are found, and performing verification that all of the other steps are still humming along. And once those costs are considered, a monorepo-wide single version rule imposes more complexity than it resolves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment