Tested Semantic Versioning

This is v0.0.2 of a proposal document for Tested Semantic Versioning, referred to in this document as TestVer, or tested versioning. The author is Sam Atman, who holds the copyright. It is released under the terms of the CC BY-SA 4.0 license. Note that this requires that any changes be clearly identified as such.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Rationale

Semantic Versioning (SemVer) is a social solution to a technical problem. The problem is compatibility of progressive releases of a unit of software (here called a Package) with software which depends on it (here called an Application). These are roles, rather than distinct categories, it is commonplace for a package to be an application of another package, while serving as a package to a different application.

The solution which SemVer offers is to assign a social meaning to the version string of packages, where a version string such as m.n.p is assigned natural numbers, which are only modified by incrementing them, and which convey whether the changes between one release and another are Major, Minor, or Patch. These meanings are specified, and amount to a social contract between the authors of packages (whom we call Maintainers) and the authors of applications which include those packages (whom we call Users). We call versions starting with 0 Development Versions, those starting with 1 or above are Release Versions.

The most important aspect of this agreement is the promise that, beginning with the first release version, only major releases will introduce incompatible changes to the API. However, the standard for what the API is, and what constitutes breaking it, is fundamentally social in SemVer. The standard specifies that there MUST be an API, and that it SHOULD be "complete and comprehensive", and "could be declared in the code itself or exist strictly in documentation".

TestVer is offered as a backward-compatible restriction of SemVer, which specifies a more rigorous standard for a) what is the constitution of the API and b) what constitutes an incompatible change to it.

Prerequisites

TestVer standards apply at an Ecosystem level, and may be adopted on an Ecosystem or a Package basis. The term ecosystem as used here is deliberately somewhat underdetermined. It refers to a community of practice, with common tooling, and interoperability between packages. An ecosystem may encompass more than one language, and a language may have more than one ecosystem which uses it.

The prerequisite for packages in an ecosystem to adopt tested versioning is an agreed-upon testing system. This test system MUST be universally available, and SHOULD be either the ecosystem's de jure or de facto standard. If the test system doesn't support the operations described in this document, it MUST be extended to do so, or another test system adopted or designed which does. These adoptions or extensions may be specific to packages using TestVer, but SHOULD be merged into the base distribution of the test system if the ecosystem in whole adopts tested versioning.

To get the greatest benefits from TestVer, the operations defined by this standard SHOULD be integrated into CI, package distribution, and dependency management. These integrations MAY take the form of extensions to the base software used for these purposes. If an ecosystem officially adopts TestVer, then these integrations SHOULD be built into the base software. They MAY be made optional or configurable, for backward compatibility, or if other circumstances make it difficult or infeasible to make the operations mandatory. In any case, the result MUST be consistent, that is, all tooling must arrive at a common answer on operations here defined.

The critical characteristics of such a test system are as follows. By definition, the test system must be able to test the behavior of packages, using lines of plain-text source code (which SHOULD be in the same programming language as the package under test). This MUST be able to represent individual tests which either pass or fail, and it MUST be possible to group tests into larger units, which we call Testsets. It SHOULD be possible to group testsets into larger testsets, and it SHOULD be legal for a testset to contain no tests.

It MUST be possible for these tests to live in a distinct file, and there MUST be a convention for which file is the Main Test File. Which file this is MAY be specific to TestVer extensions, or MAY be the file which is conventional for the main test suite in that ecosystem, if this is possible while adhering to the additional requirements of the standard. The word 'distinct' does not preclude the main test file being integrated with source code, equivalently, it is not a requirement that the main test file contain only tests.

Additionally, the test system MUST be able to give names to these individual tests and testsets. These names SHOULD be ordinary strings in the language's native type, which MAY be restricted to the equivalent of an identifier, but they MUST be able to carry human-language semantic meaning. This means they MUST NOT be numerals, UUIDs, or any other opaque format. The names of tests and testsets MUST be available programmatically to the test system, or to its TestVer extensions: this means they cannot be comments, unless those comments are surfaced into the test system itself in some fashion. A named test MAY execute as several tests, for example, it may test a line of code within a for loop, so long as the test system can group all successes and failures of that test under its name.

It is preferable that the only restriction on names be that they encode valid characters in the language's native encoding of strings. This SHOULD be some variety of Unicode, ideally UTF-8, but the system may use whatever encoding imposes the least implementation burden for that ecosystem. To offer one reason this should be preferred, packages may wish to adopt a naming convention which includes version strings. These are made up of numerals separated by the . character, which is a pattern largely disallowed in identifiers. The names MUST be able to represent the specific strings defined in the rest of this document, which imposes the additional restriction that they MUST allow unacccented Latin letters, of the set found in ASCII encoding. It is not necessary that the encoding be ASCII-compatible.

The previous requirements are common, but not universal, among existing testing systems, but the following is not: a test system suitable for TestVer MUST be able to determine whether two versions of a test are the same test. Some guidance on what this means, and how it may be implemented, are included in a later section.

Finally, such a test system MUST be able to provide certain metadata about a test, such as whether it is expected to fail, is subject to deprecation in later versions, or is expected to fail currently but will pass in some future version. The complete set of needed behaviors is described in later parts of this document.

Packages which opt in to tested versioning MUST include a way of running this test system against the main test file. Instructions for how to do this MUST be included in the documentation, preferably including in the README file, this can be a simple as a badge or a link. This requirement is also fulfilled if an ecosystem has officially adopted TestVer, such that the core documentation for that ecosystem includes these instructions. In such a case, individual packages need not repeat this information.

Maintainers within a given ecosystem are encouraged to converge on a single standard for TestVer packages in that ecosystem, at whatever pace is comfortable. An ecosystem which officially adopts TestVer MUST, as the prerequisites of official adoption, have a singular system for these operations, that system must itself use TestVer, and have a major version number greater than 0.

The obligations and standards which that major version imposes are the primary topic of the remaining document.

Tested Versioning

As a backward-compatible variation on SemVer, with additional restrictions and standards, TestVer assigns the same meaning to version strings. The 11 requirements for versioning are adopted without changes.

This standard suggests no change at all to how versions are conventionally represented, for packages within an ecosystem which already uses SemVer, officially or otherwise. Specifically, for the common convention of prepending "v" to a version string, packages using tested versioning should follow this convention.

Major Version 0

These are versions of a package which are still in flux, and haven't committed to a public API. The only additional requirement TestVer imposes, is that such a package MUST have a main test file, and it MUST be possible to execute it. This main test file MUST serve to identify the package as using tested versioning, but how this is accomplished is implementation-specific. Suggestions include a unique relative file path and name, and/or a specific string or header comment easily found within the file, whatever approach is convenient to the ecosystem's existing standards should be chosen. A test system SHOULD allow this to be inert, that is, to contain no tests or testsets at all. An empty testset is considered inert for these purposes, and is one option for meeting the identification requirement. The main test file MAY contain tests, and testsets, including the special testsets defined below.

The standard which is used to identify the main test file MUST be documented by tooling. It is not necessary for individual packages to document this, although they may want to, especially if the use of tested versioning is uncommon in their ecosystem.

If and when a 0.x package defines any of the special testsets covered in the next section, then those testsets are subject to certain standards and restrictions. How these resemble, and differ from, the standards and restrictions applicable to release versions, is discussed below.

Major Version ≥ 1

Incrementing the major version number to a positive value indicates a package with a stable public API, in SemVer as in its tested variant. Bringing a degree of rigour and automation to the meaning of this promise is the purpose of the TestVer standard.

Beginning with version 1.0.0, a TestVer package MUST define a testset called "API". This name MUST be spelled in capital letters, exactly as depicted. This testset MUST contain at least one named test, and this test MUST pass.

Any test which ever appears in the API testset MUST NOT be removed from the test suite, its name MUST NOT be changed, and it MUST NOT be modified; more detail on what constitutes modification will be provided below. The name of each such test must be globally unique, that is, no other test in any testset of the system may have, nor may ever have, the same name as any test which is ever in the API. As a way to ensure this while retaining flexibility, the test name may include the version in which the test is introduced. The term "the system" in the preceding refers to the natural unit of software for a given ecosystem, such as a repository, it does not refer to the ecosystem taken as a whole.

For the duration of a major release cycle, these API tests MUST continue to pass. More tests MAY be added to the API testset during a major release cycle, and minor versions, being those which introduce "functionality in a backward compatible manner", SHOULD include API tests, which SHOULD be introduced through the candidacy process described below. Patch releases MAY introduce new API tests. During a major release cycle, tests in the API testset MUST NOT be moved out of it.

The test system MUST detect any violation of this agreement, and MUST clearly report it as a failure of the versioning contract. CI systems which are integrated into TestVer SHOULD refuse to tag any minor or patch release which doesn't pass this standard, exceptions MAY be made for version strings carrying certain build metadata, as it may be useful to have a tag for referring to a build or commit of a package which doesn't pass the tests. Package and dependency management systems with integrated tested versioning SHOULD NOT prepare versions for release which fail this standard. Individual packages opting in to TestVer MUST NOT release minor and patch versions which fail to meet this standard.

The requirement that API tests must pass applies only to release versions of the major.minor.patch style, any "+" or build release may have an arbitrary amount of failing tests. That the test itself (as distinct from its result) cannot change, nor their names, applies to build releases as well, tooling MAY check this but is not required to.

Development versions in the 0.x phase MAY have an API testset as well, but the contents of this testset are subject to no formal restrictions. It is good practice to only add tests to the API testset which are expected to be stable on 1.0, but a package MAY modify such tests in any way, including changing the name, the test itself, moving the test to another testset, or removing it entirely. API tests which survive from one minor release of a development version to another SHOULD NOT have their names or contents changed, but may be moved elsewhere. Significant changes to a 0.x API testset SHOULD be clearly conveyed to users, through a changelog or NEWS file.

An important note: maintainers should strive to uphold the spirit of semantic versioning, not just the technical standard specified by the various special test suites defined in this document. Releasing a release version of a package implies public API stability, and the testing requirements in this standard supplement, but do not replace, the obligation not to break applications using that public API during a major release cycle. Note that TestVer, by design, places no requirements on how complete or comprehensive the API test suite is. A test can define behavior as stable, but the absence of a test cannot conclusively determine that a change in package behavior doesn't amount to breaking the public API.

Other Testsets

The API testset is the only one which is mandatory for a major release, and it may be the case that a 1.0.0 has only that testset among the ones here described.

We begin with an important precept: a package MAY have other testsets in addition to the ones defined in this document, and this is encouraged. Testsets without one of the specific names defined in this standard have no meaning according to TestVer. The tested versioning system SHOULD be capable of running them, but MUST NOT consider them in terms of compliance with the standard. Tools such as CI and package registries MAY require all tests to be green in order to tag or bundle a release, but such failures MUST be clearly distinguished from a break in the TestVer contract.

It is only tests in the API testset which are subject to the requirements described in the previous section. There are a few supplementary testsets which flesh out the semantics of TestVer, which we now describe.

Note that, while the standard requires the testset "API" to be styled precisely with those three characters, no such requirement exists for the remaining names. An implementation SHOULD use them as provided, unless there is some good reason to, for example, use an underscore or hyphen instead of a space. Whether or not the testset or test names are case sensitive may follow the dominant standard of the ecosystem, but this SHOULD be consistent between test and testset names, and if the ecosystem's standard differs from that of literal string comparison, this choice MUST be documented.

The requirements to which API tests are subject are strict, and it may be considered good practice for such tests to be introduced before being added to the API testset. A testset called "API Candidate" contains such tests. It should be intended that these tests be moved to the API testset within a small number of releases. One to three patch releases, or the next minor release, are good values of 'small'. It's also fine for API Candidate tests to be introduced in a build or release candidate "+" version, and incorporated into the API test suite when that release is made final. It is not a requirement that API tests ever appear in the "API Candidate" testset.

The requirements for API Candidate tests are twofold: they MUST have globally unique names, and if the test changes, the name MUST also change. Tests may change in two ways: either the test is modified, or the value of the test as test or fail changes. Either is grounds for a change to the name. Note that the latter refers to the expected value, any test of any sort may of course fail during development. If the version is included in the test name, this is easily done by incrementing the version, but note that API Candidate tests which don't change SHOULD NOT change their names without a good reason, and the version number of the package being incremented is not a good reason.

API Candidate tests which are moved to the API testset during an upgrade SHOULD NOT change in any way in the process. If it is necessary to make changes to an API Candidate test, it is good practice to keep it in the API Candidate testset for at least one release before moving it. API Candidate tests SHOULD NOT be deleted entirely from the suite, especially if they survive several upgrades, but SHOULD instead be moved to the "Obsolete" testset.

It is sometimes useful to introduce new features to a major or minor version which are public, but experimental. A testset marked "Experimental" MAY contain tests which explicitly describe behavior which might be modified or removed without a change of major version number. Experimental tests also have the unique name requirement, and the obligation to change that name in some fashion should the test itself change. There is a social distinction between the Experimental testset and tests in sets which aren't defined by this document: users are encouraged to try out features covered by the Experimental testset, and offer feedback as to whether it's useful, but shouldn't count on the features sticking around. By contrast, the default status of a testset is that it covers internal behavior which may be changed at leisure, and users should be discouraged from using program constructs only covered in those testsets, or relying on behaviors so described.

It's fine to move tests from Experimental to API without first passing through API Candidate, although the latter is sometimes appropriate. Whether reverted experiments should be moved to Obsolete, or simply be deleted entirely, is situation-dependent. It should be the normal fate of API Candidate tests to end up in API, while no such rule of thumb can, or should, be applied to Experimental.

Deprecating and Retiring API Tests

One of the explicit goals of TestVer is to encourage more major releases, complete with breaking changes. Ecosystems using SemVer have observed a reluctance to ship major version upgrades, because of the stress this places on users and application code, a tendency observed by the author of SemVer himself. The worst sort of failure here is maintainers reasoning themselves into shipping a breaking change under a minor version, by telling themselves that the breakage is minor, or isn't part of the API. But it's also suboptimal for package code to stay indefinitely stuck with poor architecture or behavior, due to the all-bets-are-off nature of a major version change, where what breaks and what doesn't can only be conveyed in natural language.

TestVer mitigates the first of these problems to a significant degree. When integrated at the ecosystem level, user code gets a guarantee that API-tested behavior will not change without a version bump, and any package which opts in to TestVer provides these benefits on a piecemeal basis. While the public API and the surface area of the API tests aren't identical concepts, this state of affairs is a strict improvement over reliance on convention and documentation to convey the behaviors considered stable.

Beyond this, the way tests are handled by a TestVer-compliant system offers a way to perform breaking changes with minimal disruption.

Tests MAY be moved out of the API testset ONLY on a major version upgrade, and even then, they MUST NOT be changed, or deleted from the testset. The only valid place to move an API test is the testset "Obsolete", which MAY be internally grouped by the version of the API to which those tests last applied. By definition, such tests are now failing, and as such, there's no reason to run them, but their continuing existence serves as documentation, and is an important part of the user integration described below. A test system MUST provide metadata which indicates that these tests are not expected to pass, it MAY run them if desired, and it MAY alert the user running the tests if such a test unexpectedly begins to pass again.

Note that an obsolete test might throw an error if actually run, and this may pose a greater problem in static languages where an obsolete test may not even compile. A TestVer test system must be prepared to handle these eventualities in an implementation-defined way. The requirements are that Obsolete MUST contain all retired API tests, which MUST have their original names, and it MUST be possible to verify that the tests are unmodified from their state when added to the API testset. We address some specifics of this in the section on modification below.

A tested versioning system MAY allow tests in the Obsolete category to be moved to a separate tombstone file, but this MUST be linked to the main test file using a normal mechanism for source code inclusion. Obsolete tests which were part of the API testset MUST NOT be removed entirely, ever. Obsolete API Candidate tests SHOULD be retained for a reasonable number of releases, and Experimental tests MAY be moved to Obsolete, and MAY be removed entirely at any time. Since the testing software and other integrations MUST be able to verify that the obligation not to remove tests sourced from API has been met, the simplest policy is the Hotel California one: tests check in, but they can never leave.

The test system MUST be able to mark a test with metadata indicating that the test, and the behavior it describes, is deprecated, and due to be removed in an upcoming release, with "deprecated". It also MUST be able to mark a test which is currently failing, as one which is expected to pass in a future release, with "future". Applying a deprecation to a test does not qualify as modification, thus it may be done to API tests. Indeed, this is the recommended practice for releases leading up to a major version upgrade. This metadata may take any form, so long as it is available programmatically, and the form it takes is clearly documented. The names "deprecated" and "future" are normative, not standard.

A smooth upgrade might look like this: the WidgetFactory, as defined by its API test, returns a Gizmo when provided with the string "foo". For good and sufficient reason, the maintainers decide to upgrade this return value to a Turbo Encabulator, which is more chronodynamic, having a streamlined and 15% more cromulent user interface.

This breaking change is heralded in several ways: a WidgetFactoryNext is added to Experimental, the test "WidgetFactory "foo" returns Gizmo v1.6" is deprecated, and a failing test "WidgetFactory "foo" returns Turbo Encapsulator v2.0" is added to API Candidate, tagged with "future". When 2.0 drops, WidgetFactoryNext is retired, and the test deleted, "WidgetFactory "foo" returns Gizmo v1.6" is moved to Obsolete, complete with deprecation, and "WidgetFactory "foo" returns Turbo Encapsulator v2.0" is moved to API.

This raises questions of how users and their application code might cope with all these changes, which is the subject of the next section.

Dependency Testsets

The special testsets described above are for software in its role as a package. That is, they're a way for maintainers to assume an enforceable promise to users of the software that certain things will only change in well-defined ways, and under specific circumstances. The last special testset is for software in its role as an application, that is, as a consumer of package code. It is perfectly normal for one software project to play both of these roles.

The special testset "Dependencies" consists of listed package dependencies, and the names of tests which specify behavior which the application code depends upon. The named tests can be in any testset reachable by the test system, not just the special testsets defined in this standard, with the understanding that the guarantees involved range from ironclad to "relying on this is actively discouraged and we might change the test but not the name, so watch out".

While it may be polite for package code to consistently follow the unique names convention, TestVer imposes no restrictions at all on any aspect of a package outside of the special testsets it describes, a decision we briefly justify in the concluding remarks.

This allows deprecations to surface early, as well as any changes to API Candidate or Experimental tests, those subject to the unique names constraint. Fully integrated into an ecosystem, this allows package managers to detect possible incompatibilities in an upgrade, apply the upgrade on a provisional basis, and run the application's test suite to check for regressions. If an application depends only on a small part of a package's API, and a major version change introduces no changes to that part, a major version upgrade should be no different from a minor or patch upgrade. Conversely, application code which relies on experimental API may need modifying on a patch release of that behavior is deprecated or removed.

This is the full standard for tested versioning: the special testsets "API", "API Candidate", "Experimental", "Obsolete", and "Dependencies", along with the behaviors specified for tests in each of these categories to follow.

It remains to clarify what it means to modify a test, and offer some closing remarks on social factors which may arise given the adoption of tested versioning.

Modification

The most important stricture which tested versioning imposes is its ban on modifying API tests, including removing them from the test suite. All that is allowed is changing of the metadata, and moving API tests to Obsolete, and the latter only during a major update.

In practice, there is some leeway in what constitutes modification, depending on how detection of such changes is implemented. We start by reiterating that the unique name of the test MUST NOT be changed. Sometimes this will be unsatisfactory, as embedding a typo, or just a bad description, in an otherwise acceptable test. An implementation MAY borrow a page from Unicode and allow aliases for a given test to be defined, but the original name must remain unique, unmodified, and valid to use as a reference to the test. It may be better to just accept the occasional cosmetic flaw in exchange for a simpler implementation with a single source of truth.

Besides this, it is equally important that the meaning of a test never change, although as we're about to see, it isn't practical to completely guarantee this programmatically. An implementation MUST define the method used to determine if a test has changed, and MUST NOT change that method without a major version upgrade to the test package, and even then it SHOULD NOT without a compelling or even dire need to do so. This is the core of a tested versioning system and great pains should be taken to get it right the first time.

The suggested choice is to allow only whitespace changes to the test itself. It's common for a test to have one or several setup variables, where the names could be changed without changing the behavior the test describes. But there's no compelling need for those names to change, either, and it's somewhere between difficult and impossible to detect in all cases when a variable name does or does not matter in substance.

But hashing the literal string of the test, whitespace and all, can cause brittle and spurious errors which are difficult to convey to the user, and lead to absurdities like a test being checked in with a trailing space mark, which everyone else's editor will try and remove until the end of time, breaking the test in the process. We suggest two approaches to determining difference: a Merkle hash of the tests's abstract syntax tree, or a hash of a string of the test after whitespace is normalized. The former should be preferred if the language exposes the AST as a matter of course. These suggestions are not exclusive of other approaches which allow modification of a test to be detected programmatically.

Testing software SHOULD maintain a separate source of truth for these hashes, to ensure that versions of the tests are compared to each other, and not to an inline hash which may or may not correspond to previous versions of the test. Therefore the tests SHOULD NOT include the hash as metadata next to the test itself.

Whatever method of verification is chosen MUST be applicable to tests moved to Obsolete, which in some cases may no longer compile. This can usually be met by embedding the text of an Obsolete test as a string, which is then parsed, normalized and hashed, hashed, or what have you. Obsolete tests SHOULD only be stringified in the event that they prevent the main test file from being compiled and run.

The greater challenge to the spirit of tested versioning is setup code. Some behaviors can be completely specified within the test itself, but others require setup, sometimes quite a lot of it, with mocks and test data and so on. It would be brittle and onerous to insist that this code be set in stone next to the behavior which it demonstrates, and setup might also call on internal details which are explicitly not intended to be fixed by the public API, and shouldn't be dragged into it by accident of necessity.

It's a fact that discounting setup code from the test restrictions opens the door to a certain sort of malicious compliance, where the behavior could be drastically changed but the setup modified to disguise this by returning the same result. While this is true, nothing in TestVer prevents a package upgrade from ransomware-locking a user hard drive, either. If the system is not flexible, gradual, and beneficial enough to justify any extra labor, it won't be adopted.

The importance of determining exactly what constitutes modification is entirely that an algorithm can only implement an exact standard, and programmatic confirmation that the standard is being adhered to is a major point of using tested versioning.

Semantic Versioning of TestVer

If TestVer is to be adopted, it will be a gradual process. Yet the standard sets requirements which few if any testing suites are capable of meeting out of the box.

To facilitate adoption, anyone can switch to TestVer at any time, simply by pledging to observe the restrictions of SemVer spelled out in the standard to the best of their ability. The minimum requirement is to have tests marked API, which aren't modified during a major release cycle, and which are never removed from the test suite.

Such a package may identify itself as a TestVer 0.1 package.

Examples of the ways in which the standard may relaxed: unique names may be comments, test suites might have no understanding of TestVer, including the ability to check that tests haven't changed, obsolete tests may be made into comments if they can no longer be compiled, metadata may be in the form of comments, the standard for identifying the main test file may be ad-hoc, and so on.

As work progresses in a given ecosystem on full support of the standard, this should be updated to use the version number of the affiliated tooling. If there's ambiguity as to which toolchain this refers to, this should be disambiguated.

When the tooling has reached 1.0, this may be changed simply to TestVer. It SHOULD NOT be identified as TestVer 1.x long term, this may be appropriate short-term as a way of advertising that the standard is now fully implemented.

This usage of a version code for TestVer could create some ambiguity with the standard itself, which is also versioned. If it is necessary to refer to a version of the standard, this should be called by its full name, Tested Semantic Versioning x.y.z. This may be abbreviated TestedSemVer, but please follow the standard practice in technical writing of using the full term first, and only after that, an abbreviation of it.

Appendix: Adoption and Social Factors

This section is a discussion of the standard, and not a part of the standard itself.

Our lede claims that SemVer is a social solution to a technical problem. Tested SemVer, then, is a technical solution to a social problem. The first of these is unsatisfactory, the last is an oxymoron.

Early feedback on the concept of tested versioning has shown a number of concerns: that it would place an undue burden on maintainers, that it would make architectures brittle and hard to modify, that the test suite would become the tail that wags the dog, that maintainers would use an API testset as an excuse to sneak in breaking changes which weren't covered by the tests, that some sorts of software are inherently hard to test thoroughly, or that SemVer / CalVer / something else is working just fine, thank you, and they would resent being forced to switch.

These are legitimate concerns. It is notable that the SemVer standard is admirably short, which this document is not. My hope and belief is that adopting tested versioning will in fact alleviate the underlying problems with SemVer, which prompt the concern that adding more requirements to the mix would exacerbate those same problems.

Tested versioning has the primary purpose of giving maintainers and users common ground. A key example of this is the introduction of the API Candidate and Experimental testsets. SemVer leaves the concept of API up to documentation. It is a common practice to introduce and later regress functionality within a major version, by documenting it as experimental. Does SemVer allows this? Probably, but the standard is actually silent on this subject. Tested versioning formalizes this practice: not only is this permitted, but it is explicitly supported. User code relying on a feature tested in Experimental may document this in the Dependencies testset, and receive prompt notification of a change of status in return.

This standard is an extension of Semantic Versioning, not a replacement for it. Suppose a maintainer were to release a 1.0 version of a package containing only the API test expect(true). This satisfies the formal requirements of TestVer, but it doesn't convey the moral right to make arbitrary modifications to the public interface of the package without marking a breaking change.

By the same premise, TestVer does not impose an obligation that every nook and cranny of the public API be comprehensively fixed in place and documented using the API testset. Any responsible package claiming to be a stable release will have some tests, surely, and some of those will document behaviors which can't rightly change without that change being a breaking one. Those tests should be assigned a permanent name and placed in the API suite. That, and retiring any obsolete tests of that nature to the Obsolete testset on the next major version, are the complete additional requirements of tested versioning.

Young ecosystems might adopt TestVer wholesale, seeing the advantage which it brings, and do all of these practices as a matter of course. For mature ecosystems which already practice SemVer, adoption can and should be a gradual process. We strongly recommend such ecosystems adopt the standard on an individual and supererogatory basis, rather than try to impose it from the top down, or center out. Given time to prove its worth, and a consensus around adoption, TestVer can be formally adopted. But only then.

The key to the whole project is the Dependencies testset. This affordance allows application code to easily document the features from package code upon which the application relies, whether or not those features are API, or stable in any way. One of the motives in designing the standard is the observation that users inevitably make use of 'private' features of package code, and no matter how discouraged this might be, if those features change then that user code breaks.

Another observation is that no amount of documentation can adequately settle the question of what features and behaviors are considered to be stabilized within a major version. Tested versioning can't do this either, for a few reasons: it does not and cannot impose requirements beyond the formal on the API test set, it does not and will not recognize the test suite as the moral equivalent of the API itself, and it respects the simple fact that even the most comprehensive tests only test what authors thought to test, and there is always the unexpected above and beyond that set of features.

Maintainers may be wary of the risk that they might paint themselves into a corner with API tests, and understandably so. They are encouraged to use a light hand in populating that testset, at least at first. An API test can merely document that a function exists, or that a returned value has a specific field, for example. There is no requirement, and should be no expectation, that v1.0.0 cover all of the ground which the API testset for that major version will ever have.

For the user's part, there is a simple practice available, should their code rely on some feature of a package which is not documented in a test. Write the test! Open a PR with that test in a not-special testset (giving maintainers the flexibility to merge it, then change names and other desiderata as they see fit), and propose it for the API suite. It may or may not be included in any of the special testsets, but I've never known a maintainer to refuse a passing test outright, and any of these outcomes gives a good signal about what user code should expect for the future of at least that major version.

A fully functional TestVer ecosystem should allow package code to upgrade to a new major version just as smoothly as a patch, provided that the tests in the application's Dependencies testset are still present in the package's API. This should be a great help to distribution packagers in particular, who spend much of their time deciding when and how to upgrade library code which applications depend upon, and for whom a major release with its all-bets-are-off nature can be a serious headache.

Some kinds of code, particularly the interactive kind, make it diffcult to rigorously fix the public API through tests alone. This is fine! Such packages can coexist with the rest of a TestVer ecosystem, because again, a test documenting the existence, parameter count and types, and other basic features of an API is in fact a test, and the rest of the API may be documented in natural language in the accustomed fashion, precisely as semantically-versioned code of this nature is documented right now.

Approached with the right spirit, tested versioning solves many specific pinch points with SemVer. A package may reach a point where some functions it provides are widely used, with a stable interface, while other parts may still be growing and subject to churn. In TestVer, the stable parts are added to API (if they aren't already), unstable parts get documented as such in Experimental, and what's in between goes in API Candidate.

SemVer is often interpreted (though the standard doesn't require this) as meaning that anything which is public in languages which have such a concept must be stable on major release. TestVer is clearer: while the API may cover more ground than the API testset, that which is explicitly added to Experimental and API candidate is conclusively not the stable API. Software in this intermediate state should be encouraged to offer stability guarantees on what's ready, and refine the rest when it can, which TestVer does.

It will also serve to mitigate the sturm und drang around subsequent major version releases. A package with fifty API tests, which discovers a bug requiring one of those tests to change, is obliged by SemVer to make a major release. But maintainers are understandably reluctant to do this, because major releases under SemVer are stressful for everyone.

In TestVer, for any user code which only relies on the other forty-nine tests, a major release is just another day. They can check this in the release notes, and verify it in their own test suite, at the cost of merely including a string identifying the Dependencies test as important. CI can be set up to do all this automatically.

This is a draft proposal, and I welcome any suggestions for refinement, clarification, or improvement. Let's make things a little better, one test at a time.

mnemnion/test-semver.md