Last active September 29, 2023 22:43
Pattern Matching, 2023-07-10

Pattern Matching Rewrite


This proposal has several parts:

  1. A new syntax construct, the "matcher pattern", which is an elaboration on (similar to but distinct from) destructuring patterns.

    Matcher patterns allow testing the structure of an object in various ways, and recursing those tests into parts of the structure to an unlimited depth (similar to destructuring).

    Matcher syntax intentionally resembles destructuring syntax but goes well beyond the abilities and intention of simple destructuring.

  2. A new binary boolean operator, is, which lets you test values against matchers. If the matcher establishes bindings, this also pulls those bindings out into the operator's scope.

  3. A new syntax construct, the match() expression, which lets you test a value against multiple patterns and resolve to a value based on which one passed.

Matcher Patterns

Destructuring matchers:

  • array matchers:

    • [<matcher>, <matcher>] exactly two items, matching the patterns
    • [<matcher>, <matcher>, ...] two items matching the patterns, more allowed
    • [<matcher>, <matcher>, ...let <ident>] two items matching the patterns, with remainder collected into a list bound to <ident>. (Can use const or var as well; see "binding matchers". Only binding matchers allowed in that position; not anything else.)
  • object matchers:

    • {<ident>, <ident>} has the ident keys (in its proto chain, not just own keys), and binds the value to that ident. Can have other keys. (aka {a} is identical to {a: let a})
    • {<ident>: <matcher>, <ident>: <matcher>} has the ident keys, with values matching the patterns. Can have other keys.
    • {<ident>: <matcher>, ...let <ident2>} has the ident key, with values matching the pattern. Remaining own keys collected into an object bound to <ident2>.
  • binding matchers:

    • let <ident>/const <ident>/var <ident>. Binds the matchable to the ident. (That is, [let a, let b] doesn't test the items in the array, just exposes them as a and b bindings.)
    • (To bind a matchable and apply more matchers, use and to chain them: let a and [b, c].)

Value-testing matchers:

  • literal matchers:

    • 1,
    • "foo",
    • etc. All the primitives, plus (untagged only?) template literals.
    • also unary plus/minus
    • -0 and +0 test for the properly-signed zero, 0 just uses === equality.
    • NaN tests for NaN properly.
  • variable matchers

    • <plain-or-dotted-ident> evaluates the name.

      If the name has a custom matcher (see below), it passes the matchable to the custom matcher function and matches if that succeeds. Otherwise, it just matches based on equality. (Uses === semantics, except that NaN is matched properly.)

    • <plain-or-dotted-ident>(<matcher-list>) evaluates the ident, grabs its Symbol.matcher property, then invokes it on the matchable. (Throws if it doesn't have a Symbol.matcher property, or it's not a function.) If that succeeds, it further matches the result against the arglist, as if it was an array matcher.

       Option.Some(foo) examples goes here
  • regex matchers:

    • /foo/ matches if the regex matches. Named capture groups establish let bindings.
    • /foo/(<matcher-list>) is identical to custom matcher - if the regex matches, then the match result (the regex match object) is further destructured by the matcher list.

Boolean matcher logic:

  • <matcher> and <matcher>: Tests the matchable against both matchers (in order), succeeds only if both succeed. Accumulates bindings from both. If first fails, short-circuits.
  • <matcher> or <matcher>: Tests the matchable against both matchers (in order), succeeds if either succeeds. Accumulates bindings from both, but values only from the first successful matcher (other bindings become undefined). If first succeeds, short-circuits.
  • not <matcher>: Tests the matchable against the matcher, succeeds only if the matcher fails. No bindings.
  • Matchers can be parenthesized, and must be if you're using multiple keywords; there is no precedence relationship between the keywords, so it's a syntax error to mix them at the same level.

Using Matchers

  • New match(){} expression:

     match(<val-expr>) { 
     	when <matcher>: <result-expr>; 
     	default: <result-expr>;

    Find the first "arm" whose matcher passes, given the val. Evaluates to the corresponding result for that arm. The matcher can produce bindings that are visible within the matcher and within the result; they don't escape the arm they're established in. (Are var matchers allowed or disallowed?)

    default arm always matches. If no arm matches, throws.

  • New is operator

     <val-expr> is <matcher>

    Evaluates to true/false if val passes the matcher or not. If the matcher has binding patterns, within the matcher they behave as normal; see below for behavior outside of the matcher.

    Doing it manually with match() would be:

     let passes = match(<val-expr>) {
     	when <matcher>: true;
     	default: false;
  • When is is used and the matcher establishes bindings:

    • In if(), the bindings are lifted to a scope immediately outside the if() block, encompassing the following else as well. (Likely, we define an analogous scope to what for(of) uses.) Lexical bindings are TDZ if the matcher doesn't match. var bindings simply don't set a value if the matcher doesn't match.

      (Bindings will often not be useful in the else, but will be in cases like if(!(x is <matcher>)){...}else{...}, where the matcher successfully matches but the if fails.)

    • In while() and do{}while(), same behavior. (In do{}while(), lexical bindings are TDZ on the first iteration.)

    • In for-of, the bindings exist in the current outer for scope, same as any other bindings established in the for head.

      (TODO: write an example of for-of usage; I'm not clear how it's supposed to work.)

(We've lost matchers in plain let/etc statements, which I guess also means we lose matchers in function arglists. Unfortunate.)

nmn commented Sep 19, 2023

I wrote a long, almost identical proposal in an issue on the original repo. Now that I've seen this, I would like to propose two small modifications and a few questions:

Let's use ... in Object matchers too?

[<matcher>, <matcher>] matches arrays with exactly two elements. [<matcher>, <matcher>, ...] can be used to match any array with at least two elements.

However, {<ident>: <?matcher>, <ident>: <?matcher>} matches any object with at least those two keys. I understand things get a bit complicated with prototypes, but I feel like this matcher should not match for objects that have additional "own" keys.

e.g. {name: 'John Doe', age: 30} should not match the matcher { name } since it has additional "owned" and "enumerable" keys. { name, ... } should be allowed to match objects that may contain extra keys.

This change makes the whole system more consistent IMO.

Syntax for matching instances of classes

We should support a matcher that looks like Person { name } which works exactly the same as the object matcher { name } but it also checks that the value being matched is instance of Person.

All proposed changes

Here's what I would add to the proposal above:

  • object matchers:

    • {<ident>, <ident>} has the ident keys (in its proto chain, not just own keys), and binds the value to that ident. Can not have other own keys. (aka {a} is identical to {a: let a})
    • {<ident>, <ident>, ...} has the ident keys (in its proto chain, not just own keys), and binds the value to that ident. Can have other own keys
    • {<ident>: <matcher>, <ident>: <matcher>} has the ident keys, with values matching the patterns. Can not have other own keys.
      • This should also work with "getter function" keys
    • {<ident>: <matcher>, <ident>: <matcher>, ...} has the ident keys, with values matching the patterns. Can have other own keys.
    • {<ident>: <matcher>, ...let <ident2>} has the ident key, with values matching the pattern. Remaining own keys collected into an object bound to .
  • class matchers:

    • <ident> <object-matcher> matches the <object-matcher and is an instance of <ident>

ljharb commented Sep 19, 2023

Patterns are meant to mimic destructuring as much as possible; it doesn't make sense to me to ever care if an object doesn't have extra keys, especially since you can foo: not x or similar to ban a specific key.

instanceof semantics are terrible and should never be further cemented into the language; the current plan is to make class syntax create a default matcher that approximates the semantics of ensuring a private field exists on the receiver.

nmn commented Sep 19, 2023

Patterns are meant to mimic destructuring as much as possible; it doesn't make sense to me to ever care if an object doesn't have extra keys.

I don't disagree and this was my proposal in the long issue I wrote. My concern is that I think destructuring should be consistent across Arrays and objects.

Another solution is to change Array matchers to allow extra elements by default:

  • array matchers:

    • [, , void] exactly two items, matching the patterns
    • [, ] two items matching the patterns, more allowed
    • [, , ...let ] two items matching the patterns, with remainder collected into a list bound to . (Can use const or var as well; see "binding matchers". Only binding matchers allowed in that position; not anything else.)

instanceof semantics are terrible and should never be further cemented into the language

I'm not sure I agree. Would you elaborate your reasons for essentially deprecating instanceof?

the current plan is to make class syntax create a default matcher that approximates the semantics of ensuring a private field exists on the receiver.

Even if this is the plan, I believe the Person { name } syntax should be adopted (if viable) instead of Person(let name) that I have seen above. How it works behind the scenes is less important. If it feels like instanceof it won't really matter what it really does to make things work.

nmn commented Sep 19, 2023

Some other questions about syntax:

  1. Is there no way to re-use parts of the switch-statement syntax? Instead of match {when} could we use match { case <matcher> } instead? Does doing something like this enforce it to become a statement or something?

  2. I like using void to suggest the absence of something!

  3. Would { let: let let } be a valid matcher?

In if(), the bindings are lifted to a scope immediately outside the if() block,

Why are we not scoping the bindings to within the if() {...} block? You can't currently create new bindings to variables within an if condition so there's no prior art here about how a variable should be scoped. Is there a syntactic limitation?

{a} becoming {a: let a} still makes it awkward to just test for property existence.

Let's not make { a } become {a: let a} then? Let { a } simply check for the existence of a key a and require the use of {a: let a} when a binding is needed? I don't think our goal should be the most terse syntax possible. We should try to avoid confusion. Let's not have any object key punning in object matchers at all:

  • { a } checks if the key a exists. That's it.
  • { a: a } checks if the key a exists and is equal to the value of the variable a
  • { a: let a } checks if the key a exists and captures its value in a new variable a

ljharb commented Sep 20, 2023

instanceof can be easily faked via Symbol.hasInstance, and it doesn't provide accurate results for cross-realm builtins.

The proposal explicitly and intentionally avoids reusing any part of switch syntax, to increase googleability, and so that switch can finally be put to rest.

nmn commented Sep 20, 2023

so that switch can finally be put to rest.

It can never be put to rest since JS is append-only. And I suggested match-case instead of switch-case to reduce creating two new keywords and just create one. case as a word makes just as much sense as when so unless there's a technical reason for implementation, I think we should try to minimize the number of new things we introduce.

instanceof can be easily faked via Symbol.hasInstance ... cross-realm builtins

Fair enough, let's not use instanceof semantics, but the Person { name } syntax can still work regardless.

ljharb commented Sep 20, 2023

with has been put to rest, despite that it will never be removed from the language. switch will too, since it's horrifically terrible.

Have you read the "priorities" in the readme?

