Operational semantics of patterns in Rust (working model)

Patterns have a funky place in the efforts to specify/formalize the semantics of Rust code: as far as I know these efforts (like MiniRust) center on formalizing the meaning of MIR, the main intermediate representation of the Rust compiler. But patterns don't exist in MIR; they've already been translated into discriminant reads and switches.

Hence there's currently very little in the way of describing the precise semantics of patterns. In my capacity as the contributor most involved with everything pattern-related, I'd like to share my working model.

This is not exactly what's implemented today, instead that's what I'd like to become the normative specification. See at the end for differences.

Setting

Patterns can be used in the following syntactic locations:

match expressions;
if let/while let/let else expressions;
let expressions and for loops;
function parameters (fn foo([x, y]: [u32; 2]) { ... }).

I will be talking about patterns after some desugarings have taken place:

every one of the abovementioned syntactic forms has been desugared to a match;
match ergonomics are desugared;
constants have been turned into corresponding patterns;
or-patterns are moved outwards, causing the corresponding match arm to be duplicated as needed; e.g. (x|x, y|y) if foo(x, y) => ... results in 4 copies of (x, y) if foo(x, y) => ....
- The order of the resulting arms is given by preorder traversal (aka "left-to-right order") of the original pattern. See also this test.
never patterns (!) are expanded into more precise patterns until each ! pattern applies to an empty enum or the ! type;
deref!(<pat>) patterns have been turned into ref x if let <pat> = *x.deref(), ref mut x if let <pat> = *x.deref_mut() or box <pat> as appropriate;
the guards of guard patterns have been moved outwards and && together (left-to-right) to become a normal match guard.

Here are all the patterns we have in Rust (after the abovementioned desugarings):

the wildcard (_) pattern;
binding patterns (x, ref x, ref mut x, <binding> @ <pat>);
literal patterns (boolean, integer, float or string literals);
ADT (struct, enum and union) patterns (Some(<pat>), Struct { x: <pat>, .. });
reference patterns (&<pat>/&mut <pat>);
range patterns (0..42);
slice patterns ([<pat>, .., <pat>]);
or patterns (<pat> | <pat>);
(unstable) never patterns;
(unstable) box patterns.

Semantically, patterns operate on places. This is why e.g. let (ref mut x, _) = tuple; makes sense: this is equivalent to let x = &mut tuple.0 and would not make sense if this was operating on the value read from tuple.

Important terminology: in match <expr> { ... }, <expr> is called the "scrutinee"/"scrutinee expression". As just stated, it is treated as a place i.e. will undergo value-to-place coercion (aka "store in a temporary") if needed.

Working model

A pattern has two important roles: it specifies a "condition" which determines the values that match the pattern, and "bindings" which are places inside the scrutinee that will be bound to variables if the pattern matches.

The "condition" is made of a list of "tests" on subplaces that must succeed for the condition to be fulfilled. The possible tests are:

equality on a built-in type (booleans, integers, floats, strings);
comparison on a built-in type (integers, floats);
discriminant equality for enums;
length comparison for slices ([T], not &[T]).

I should be pretty obvious which tests/bindings a given pattern gives rise to, with one caveat: enums with exactly one variant are treated like structs and thus don't cause a discriminant test.

We order the tests and bindings of a given pattern left-to-right, i.e. in preorder traversal. Moreover, if a binding is inside another, the inner one is derived from the outer one. E.g. Some(mut z @ (Enum::Variant { x: 1, ref mut y, .. }, 2)) on place p gives 4 tests and 2 bindings:

discr(p) == Some;
discr((p as Some).0.0) == Variant;
((p as Some).0.0 as Variant).x == 1;
(p as Some).0.1 == 2;
let mut z = (p as Some).0;
let y = &mut (z.0 as Variant).y.

The semantics of a match is then:

in top-to-bottom order, consider each match arm;
in the order explained above, run each test of this arm;
if any test fails, stop testing and move on to the next arm;
if all the tests of the arm succeed:
- TODO guard binding shenanigans;
- execute the match guard;
- if the guard fails, move on to the next arm;
- bind the variables in the pattern in the order explained above;
- execute the arm body;
if none of the arms matched, we have UB.

It is moreover UB to change the outcome of a test from inside a match guard. Specifically: given any test coming from any of the patterns in the match, if the corresponding place is valid for the current scrutinee value, then the outcome of this test must be the same before and after every execution of a match guard. We may even strenghten this to "any write to a primitive value/discriminant/slice length inspected by any test in the current match is UB while the match is executing". This is tricky and may require new semantic MIR concepts like shallow borrows.

Details and edge cases

Discriminant equality actually works in two steps: a discriminant read (which returns an integer value) followed by an integer equality test.
A never pattern (!) gives a discriminant test that always returns false. In other words it causes read of the underlying ZST place (ZST because by our desugaring it only applies to the ! type and empty enums).

Differences with the actually implemented behavior

We reuse test outcomes in clever ways in match lowering; I haven't fully convinced myself that this is compatible with the ordering described above;
Tests inside or-patterns are actually tested last to reduce duplication; unsure if there's a way to avoid that;
We don't nest bindings like that; instead we just derive them all from the scrutinee place which tends to cause borrow-checking errors. Moreover we bind bindings right-to-left to permit x @ NonCopy(y @ Copy);
The catch-all slice pattern [..] does not currently emit a length test;
Enums that have > 1 variants but which have only 1 nonempty variant don't emit a discriminant test for the nonempty variant (but they do for the other variants I think??);
We actually reuse the by-ref bindings created before guard execution instead of rebinding them;
UB from changing the outcome of a test is not observable in final MIR (hence cannot be caught by Miri);
Deref patterns are not desugared this way, instead they are executed in-line with the other patterns, which changes when/if deref is called and when nested tests are executed.

Nadrieril/pattern-opsem.md

Setting

Working model

Details and edge cases

Differences with the actually implemented behavior