% Swift and the Price of ABI Stability

For those who don't follow Swift's development, ABI stability has been one of its most ambitious projects, and it finally shipped in Swift 5. The result is something I find endlessly fascinating, because I think Swift has pushed the notion of ABI stability farther than any language with minimal compromise.

This article is broken up into two sections: background and details. Feel free to skip to the details if you're very comfortable with the problems inherent to producing a robust dynamically linked system interface.

If you aren't comfortable with the basic concepts of type layouts, ABIs, and calling conventions, I recommend reading the article I wrote on the basic concepts of type layout and ABI as they pertain to Rust.

Background

Swift TLDR

I know a lot of people don't really follow Swift, and it can be hard to understand what they've really accomplished without some context of what the language is like, so here's a TL;DR of the language's shape:

Exists to replace Objective-C on Apple's platforms, oriented at application development
- natively interoperates with Objective-C
- has actual classes and inheritance
At a distance, very similar to Rust (but "higher-level")
- interfaces, generics, closures, enums with payloads, unsafe escape hatch
- no lifetimes; Automatic Reference Counting (ARC) used for complex cases
- simple function-scoped mutable borrows (inout)
- Ahead-Of-Time (AOT) compiled
An emphasis on "value semantics"
- structs/primitives ("values") are "mutable xor shared", stored inline
- collections implement value semantics by being Copy-On-Write (using ARC)
- classes are mutably shared and boxed (using ARC), undermining value semantics (can even cause data races)
An emphasis on things Just Working
- language may freely allocate to make things Work
- generic code may be polymorphically compiled
- fields may secretly be getter-setter pairs
- ARC and COW can result in unpredictable performance
- tons of overloading and syntactic sugar

Don't worry about fully understanding all of these, we'll dig into the really important ones and their implications as we go on.

What Is ABI Stability and Dynamic Linking

When the Swift developers talk about "ABI Stability" they have exactly one thing in mind: they want native system APIs for MacOS and iOS to be written in Swift, and for you to dynamically link to them. This includes dynamically linking to a single system-wide copy of the Swift Standard Library.

Ok so what's dynamic linking? For our purposes it's a system where you can compile an application against some abstract description of an interface without providing an actual implementation of it. This produces an application that on its own will not work properly, as part of its implementation is missing.

To run properly, it must tell the system's dynamic linker about all of the interfaces it needs implementations for, which we call dynamic libraries (dylibs). Assuming everything goes right, those implementations get hooked up to the application and everything Just Works.

Dynamic linking is very important to system APIs because it's what allows the system's implementation to be updated without also rebuilding all the applications that run on it. The applications don't care about what implementation they get, as long as it conforms to the interface they were built against.

Since Swift is AOT compiled, the application and the dylib both have to make a bunch of assumptions on how to communicate with the other side long before they're linked together. These assumptions are what we call ABI (an Application's Binary Interface), and since it needs to be consistent over a long period of time, that ABI better be stable.

So dynamic linking is our goal, and ABI stability is just a means to that end.

For our purposes, an ABI can be regarded as 3 things:

If you can define these details and never break them, you have a stable ABI, and dynamic linking can be performed. (Ignoring trivial cases where both the dylib and application were built together and ABI stability is irrelevant.)

Now to be clear, ABI stability isn't technically a property of a programming language. It's really a property of a system and its toolchain. To understand this, let's look at history's greatest champion of ABI stability and dynamic linking: C.

All the major OSes make use of C for their dynamically linked system APIs. From this we can conclude that C "has" a stable ABI. But here's the catch: if you compile some C code for dynamic linking on Ubuntu, that compiled artifact won't work on MacOS or Windows. Heck, even if you compile it for 64-bit Windows it won't work on 32-bit Windows!

Why? Because ABI is something defined by the platform. It's not even something that necessarily needs to be documented. The platform vendor can just require you to use a particular compiler toolchain that happens to implement their stable ABI.

(As it turns out, this is actually the reality of Swift's Stabilized Apple ABIs. They're not actually properly documented, xcode just implements it and the devs will do their best not to break it. They're not opposed to documenting it, it's just a lot of work and shipping was understandably higher-priority.)

But if that's the case, why don't platform vendors provide stable ABIs for lots of other languages? Well it turns out that the language isn't completely irrelevant here. Although ABI isn't "part" of C itself, it is relatively friendly to the concept. Many other languages aren't.

To understand why C is friendly to ABI stability, let's look at its much less friendly big brother, C++.

Templated C++ functions cannot have their implementations dynamically linked. If I provide you with a system header that provides the following declaration, you simply can't use it:

template <typename T>
bool process(T value);

This is because it has no symbol. C++ templates are monomorphically compiled, which is a fancy way of saying that the way to use them is to copy-paste the implementation with all the templates replaced with a particular value.

So if I want to call process<int>(0), I need to have the implementation available to copy-paste it with int replacing T. Needing to have the implementation available at compile-time completely undermines the concept of dynamic linking.

Now perhaps the platform could make a promise that it has precompiled several monomorphic instances, so say symbols for process<int> and process<bool> are available. You could make that work, but then the function wouldn't really be meaningfully templated anymore, as only those two explicitly blessed substitutions would be valid.

There would be little difference from simply providing a header containing:

bool process(int value);
bool process(bool value);

Now a header could just include the template's implementation, but what that would really be guaranteeing is that that particular implementation will always be valid. Future versions of the header could introduce new implementations, but a robust system would have to assume applications could using either, or perhaps even both at the same time.

This is no different from a C macro or inline function, but I think it's fair to say that templates are a little more important in C++.

For comparison, most platforms provide a dynamically linked version of the C standard library, and everyone uses it. On the other hand, C++'s standard library isn't very useful to dynamically link to; it's literally called the Standard Template Library!

In spite of this issue (and many others), C++ can be dynamically linked and used in an ABI-stable way! It's just that it ends up looking a lot more like a C interface due to the limitations.

Idiomatic Rust is similarly hostile to dynamic linking (it also uses monomorphization), and so an ABI-stable Rust would also end up only really supporting C-like interfaces. Rust has largely just embraced that fact, focusing its attention on other concerns.

Swift's Stable ABI

I have now made some seemingly contradictory claims:

Swift has similar features to Rust
Rust's features make it hostile to dynamic linking
Swift is great at dynamic linking

The secret lies in where the two languages diverge: dynamism. Rust is a very static and explicit language, reflecting the sensibilities of its developers and early adopters. Swift's developers preferred a much more dynamic and implicit design, and so that's what they made.

As it turns out, hiding implementation details and doing more work at runtime is really friendly to dynamic linking. Who'd've thought dynamic linking was dynamic?

But what's really interesting about Swift is the ways it's not dynamic.

It's actually fairly trivial to dynamically link a system where all the implementation details are hidden behind uniformity and dynamism. In the extreme case, we could make a system where everything is an opaque pointer and there's only one function that just sends things strings containing commands. Such a system would have a very simple ABI!

And indeed, in the 90's there was a big push in this direction with Microsoft embracing COM and Apple embracing Objective-C as ways to build system interfaces with simple and robust ABIs.

But Swift didn't do this. Swift tries its hardest to generate code comparable to what you would expect from Rust or C++, and how it accomplishes that is what makes its ABI so interesting.

It's worth noting that the Swift devs disagree with the Rust and C++ codegen orthodoxy in one major way: they care much more about binary sizes. More specifically, they care a lot more about making efficient usage of the cpu's instruction cache, because they believe it's better for system-wide power usage. Apple championing this concern makes a lot of sense, given their suite of battery-powered devices.

It's harder for third party developers to care about this, as they will naturally only control some small part of the software running on a device, and typical benchmarking strategies don't really capture "this change made your application run faster but is making some background services less responsive and hurting battery life". Hence C++ and Rust inevitably pushing towards "more code, more fast".

This is all to say that some things which seem like compromises made for ABI stability's sake are genuinely just regarded as desirable.

I never got any great concrete numbers on this concern from the Swift or Foundation folks, would definitely love to see some! Waves at the Apple employees reading this.

Resilience and Library Evolution

The Swift developers cover this topic fairly well in their documentation. I'll just be giving a simplified version, focusing on the basic motivation.

Resilience is the core concept behind Swift's dynamic linking story. It means that things default to having ABIs that don't break when the implementation changes in an API-preserving way (nothing can save ABI if, say, a function's arguments are swapped). This allows developers to create idiomatic-feeling libraries that can still easily evolve their implementations.

This is in contrast to C, which only makes it possible to create a stable ABI with proper vigilance and foresight. This is because C requires you to commit to many of the ABI details of your interface upfront, even if you're uncertain about them. If you don't want to commit to those details, you'll have to change the shape of your API to hide them.

When compiled as a dylib, Swift defaults to implicitly hiding as many details as possible, requiring you to opt into guarantees by adding annotations. Crucially, these annotations don't affect the shape of an API, they're "only" for optimizing the ABI, at the cost of resilience.

Additionally, it's possible for a library to add ABI annotations later without breaking their old ABI. Yet at the same time, applications compiled against new annotations are be able to use that information to run faster! An application which takes advantage of those annotations does however become incompatible with older versions of the library.

This is all very abstract, let's look at a simple library evolution example.

Let's say we draft up a simple FileMetadata interface in C:

// version 1
typedef struct {
    int64_t size;
} FileMetadata;
 
bool get_file_metadata(char* path, FileMetadata* output);

Which would be called as:

FileMetadata metadata;
if (!get_file_metadata("/my/sweet/file.txt", &metadata) {
    printf("error!");
    return;
}
printf("file size %lld", metadata.size);

Now let's say we realize that this function should also provide info on when it was last modified:

// version 2
typedef struct {
    int64_t last_modified_time; // 64 bits of time will SURELY be fine...,,
    int64_t size;
} FileMetadata;

bool get_file_metadata(char* path, FileMetadata* output);

Oops, we've messed up our ABI! Our hypothetical caller is stack allocating a FileMetadata, so they're assuming it has a particular size and alignment. Additionally, they're directly accessing the size field, which they assume is at a particular offset in the struct.

Both of those assumptions were violated by our change. This didn't necessarily have to happen. There's a few common approaches we could have taken to allow for this change. For instance we could have:

Reserved space in our struct for future use
Made FileMetadata an opaque type, requiring function calls to get the fields
Given FileMetadata a pointer to its "version 2" data (opaque in "version 1")

Unfortunately, all of these require us to have the foresight to do them while also changing the way users make use of our API. In some sense, the API becomes less "idiomatic" to accommodate future changes. Additionally, we will forever be burdened with this complexity even if we eventually determine that the API is complete enough to guarantee its details

Swift doesn't require you to make this compromise.

The following two designs are totally ABI compatible while remaining perfectly idiomatic to use:

// version 1
public struct FileMetadata {
    public var size: Int64
}

public func getFileMetadata(_ path: String) -> FileMetadata?

// version 2
public struct FileMetadata {
    public var lastModifiedTime: Int64 // just add this field, that's it
    public var size: Int64
}

public func getFileMetadata(_ path: String) -> FileMetadata?

Details

Once again, feel free to check out Swift's documentation of the annotations that are used to manage abi resilience.

Resilient Type Layout

By default, a type that is defined by a dylib has a resilient layout. This means that the size, alignment, stride, and [niches][] of that type aren't statically known to the application. To get that information, it must ask the dylib for that type's value witness table at runtime.

What's really interesting about resilient layout is that it's only something that the application is forced to deal with, and only in a very limited way. Inside the boundaries of the dylib where all of its own implementation details are statically known

Gankra/swiift-abi-2.md