Skip to content

Instantly share code, notes, and snippets.

@PoignardAzur
Last active April 8, 2022 13:26
Show Gist options
  • Save PoignardAzur/97939c758cec19795282126ae1260514 to your computer and use it in GitHub Desktop.
Save PoignardAzur/97939c758cec19795282126ae1260514 to your computer and use it in GitHub Desktop.
Opaque constants in Rust

This is a summary of a discussion we previously had on the rust-lang Zulip. The discussion is about a proposal regarding const generics in the Rust programming language.

The problem

With advanced const generics, library writers might write code like this:

trait Trait<const N: usize> {
    type Assoc;

    fn get_zero() -> Self::Assoc;
}

impl<const N: usize> Trait<N> for [u8; N] where N <= 4 {
    type Assoc = u32;

    fn get_zero() -> u32 { 0 }
}

impl<const N: usize> Trait<N> for [u8; N] where N > 4 {
    type Assoc = u64;

    fn get_zero() -> u64 { 0 }
}

In turn, users of the above crate might write code like:

const BACKGROUND_COLOR: (u8, u8, u8) = (200, 250, 240);

let x: u64 = <[u8; BACKGROUND_COLOR.0 as usize] as Trait>::get_zero();

While this code would compile as-is, it would depend on the value of BACKGROUND_COLOR not changing.

This is unfortunate, because it means the value of BACKGROUND_COLOR is now part of the stability guarantee. Changing the value of BACKGROUND_COLOR should in theory require a semver bump (more on that later).

Concerningly, code could even depend on private internals of a crate's type:

// crate_a
pub struct MyType(u8, u16, u8);

// crate_b
let x: u32 = <[u8; size_of::<crate_a::MyType>()] as Trait>::get_zero();

In that example, crate_b's code would stop compiling if crate_a's writer adds a field to MyType, or even if the neither crate changes but a new compiler version changes how the fields are laid out.

Isn't this already the case?

Both of these problems already exist in some form. Indeed, the following code compiles on stable Rust:

struct MyArray<const N: usize>([u8; N]);

trait MyTrait {}

impl MyTrait for MyArray<0> {}
impl MyTrait for MyArray<1> {}
impl MyTrait for MyArray<2> {}

fn my_trait(arg: Option<impl MyTrait>) {}

const A: Option<MyArray<{std::mem::size_of::<u8>()}>> = None;

const MY_GUI_COLOR: (u8, u8, u8) = (1, 2, 3);
const B: Option<MyArray<{MY_GUI_COLOR.0 as usize}>> = None;

fn main() {
    my_trait(A);
    my_trait(B);
}

And yet that code can break if we change the value of MY_GUI_COLOR.

In practice, this is an under-documented part of stability guarantees, which themselves tend to be on the informal side.

The semver compatibility guidelines give us a muddy picture:

  • Changing public fields is explicitly described as a major change. Changing private fields (in a struct which already has private fields) isn't mentionned, and is therefore implied not to be a breaking change that requires a major or minor semver bump.
  • Changing the value of constants or the code of constant functions isn't mentionned at all.

In practice, stability of const values isn't a big problem right now.

Cases like the above are explicit; code that depends on a value staying the same is often located in the same crate as that value.

When const values are known to be especially unstable and subject to change, libraries will explicitly document that fact, like in the UNICODE_VERSION case. That documentation is essentially a substitute for stability rules.

But there's an argument to be made that advanced const generics could make these breakages more frequent and indirect. As const generic features are unlocked, generic types will create constant values based on const arguments and const functions, in such a way that there might no be a clear "chain of custody" between an upstream change and a downstream breakage.

And in any case, the current situation isn't ideal. Stability guarantees should be better documented, and enforced by machine rules instead of doc comments.

A possible solution: Opaque consts

The language should have an annotation to mark that a constant's value isn't part of the stability guarantee, and therefore shouldn't be relied on.

This annotation would have no effect within a crate, but would prevent downstream crates from unifying the constant with its value, among other things.

For instance:

// crate_a
#[opaque]
const A: i32 = 42;
let array: [i32; 42] = [0; A]; // no problem

// crate_b
let array: [i32; 42] = [0; A]; // error!

Opaqueness would be infectious:

#[opaque]
const X: i32 = 0;

#[opaque]
const Y: i32 = X; // OK

const Z: i32 = X; // ERROR!

Const functions could be declared as opaque as well, so that their output isn't a stability guarantee:

#[opaque]
const fn count_unicode_grapheme_clusters(text: &str) -> usize {
    // ...
}

Arguably, some developers may even want opaqueness to be the default for functions. It's unclear how common opaque const functions would be. On one hand, library authors don't want to bump semver whenever they change internal code.

On the other hand, there are functions where it's absolutely clear the output is never expected to change, like Vec::size or iter::find, even if their internal algorithm changes.

Opaqueness of size_of

When the concept was discussed on Zulip, the idea that size_of should be opaque was controversial:

Josh Triplett: I do actually want to be able to write code like let a: [u8; std::mem::size_of::()] = [1, 2, 3, 4]. And I would expect it to stop compiling if Foo changed size.

Yet there's a strong argument to be made that the above code should raise a deny-by-default lint, because it can be a breaking change if the internals of the Foo type change.

To be specific, there are multiple scenarios where Josh's code could break:

  • (Acceptable) Foo is defined in Josh's crate, and Josh added a field.
  • (Bad) Foo is imported from an upstream crate. The developer added a private field, and Josh just updated his dependencies (maybe without even realizing it).
  • (Bad) Foo didn't change, but Josh updated his compiler, and the way Foo is laid out changed, its size is now different.

Scenario 1 isn't a problem (the user changed the fields, and can expect the size to change)

Scenario 2 can be guarded against: the compiler could say that size_of returns an opaque value if the type has private fields, or is marked as #[non_exhaustive].

Scenario 3 is harder to guard against: the compiler has to say that size_of is opaque is it has private fields OR it's non-exhaustive OR its representation subject to change with a compiler update. So repr(C) structs would have a non-opaque size, everything else is considered unstable.

That last constraint is a bit more controversial. Do we consider that there's an assumption in the ecosystem that Rust types could change layout at any point?

Again, there's a strong argument that, yes, such an assumption is reasonable.

The language reserves the right to reorder fields differently at any point. New niches might be added to enums, which fundamentally changes their size.

Safe transmute semantics are still being figured out, but the current draft seems to assume that transmute is only safe for #[repr(C)] types, which goes in the same direction.

There might still be cases users would want code to stop compiling if a type's size changes, as a safeguard. In those cases, some kind of static assertion might be preferable, or even an assert in a unit test:

#[test]
fn check_size() {
    assert_eq!(size_of::<SomeType>(), 16);
}

In that case, the user would be "manually" opting in to have their code depend on an invariant which isn't guaranteed by the compiler. They would thus have a clear responsibility for any breakage that results.

Language transition

Changing the semantics of size_of would be a language-level breaking change.

One way to avoid breakage would be to have opaqueness-related errors trigger a lint at first, and forbid that lint in a future edition.

This would have another benefits: it would give users an escape hatch at first, if they want to express type expressions that they know are correct but the compiler considers opaque.

Documentation

The semver documentation should be updated to describe the new semantics.

In fact, generally speaking, stability guarantees could be better documented. Library stability guarantees are a feature of Rust that "everyone knows about", but that don't really have a single canonical reference as far as I know.

A stability reference would give a formal description of what changes a library author can write and safely assume that downstream users's crates won't break.

While in principle every change can breaks someone's workflow, in practice having formal rules would be helpful.

Conclusion

The general principle that changing constant values shouldn't automatically be considered breaking is fairly intuitive, but it has many knock-on implications.

An #[opaque] annotation would be a good first step to applying this principle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment