Skip to content

Instantly share code, notes, and snippets.

@rtfeldman
Forked from j-maas/README.md
Created March 8, 2022 12:02
Show Gist options
  • Save rtfeldman/bfcab08dbe26eaefc7be9f2b1fd18ef9 to your computer and use it in GitHub Desktop.
Save rtfeldman/bfcab08dbe26eaefc7be9f2b1fd18ef9 to your computer and use it in GitHub Desktop.
Roc: Open and closed tag unions

Roc: Open and closed tag unions

Motivation

The goal of open and closed tag unions is to allow both exhaustiveness checking and composition of tags. The main motivating pieces of code are:

getSettings = \filename ->
    rawData <- Task.await (File.readUtf8 filename)
    settings <-
        settingsYaml
            |> Yaml.decode settingsDecoder
            |> Task.fromResult
            |> Task.mapFail InvalidFormat
            
# Inferred type if File.readUft8 can return the errors [FileNotFound, Corrupted]:
# getSettings : Str -> [FileNotFound, Corrupted, InvalidFormat]*
# Note that Roc automatically infers that in addition to the File.readUtf8 errors
# we could return an InvalidFormat.
   
   
StoplightColor : [Red, Yellow, Green]
   
whatToDo : StoplightColor -> Str
whatToDo = \color ->
    when color is
        Red -> "Stop!"
        Green -> "Go"
        
# In this case we forgot to handle the case where the color is Yellow
# and Roc will inform us about that.

Problems with the current mechanism

As evidenced in the discussions on Zulip (1, 2) there are some cases where open and closed tags behave in a way that people struggle to understand.

The main issue seems to be that Roc infers e.g. myValue = [One, Two] to be an open tag union ([One, Two]*). This goes against the mental model that we can see that myValue can only be One or Two.

It is also possible to "break" when clauses by specifying closed unions (example by Richard):

stoplightColor =
    when something is
        Positive ->
            closedRed : [ Red ]
            closedRed = Red

            closedRed

        Negative -> Yellow
        Equal -> Green

Even though one could expect Roc to combine each possible tag into [Red, Yellow, Green], specifying a closed tag union as one of the branches ([Red]) will provoke a type mismatch. Under the hood, each branch is treated as an open union (e.g. [Yellow] for the second branch) which can combine into the desired tag union. Overriding one branch to be a closed union prevents that branch from being combined with others.

The popular mental model

As I have tried to indicate with the motivating piecies of code, the popular mental model seems to follow these rules:

  1. When producing tags, I can easily combine tags into bigger tag unions.
  2. When using a tag union, I need to know all possible tags.

This clashes with the current mechanism which is focused on whether a tag union is extensible, regardless of whether I am producing it or consuming it.

Curiously, the current mechanism already infers the types in accordance to this mental model in most cases. This is probably why it is so difficult to discuss this effectively.

  • Using a closed tag union.
  • Using an open tag union.
  • Producing an open tag union.
  • Producing a closed tag union.

Real trouble only arises when we produce a closed tag union (see the previous [Red] example), because then we cannot combine it with other tags anymore.

Are there cases where we want to prevent adding more tags to a specific tag union? I don't think so. We definitely want to have control over what tags we consume, but we do not need to prevent composition to achieve that. We can always ask for closed tag unions in all the functions parameter (or variable) type annotations, to ensure that we only need to handle specific tags.

This train of thought leads to a slight modification to the mechanism:

  • Closed tag unions in branches can be combined.

Testing the new mechanic

If we allow "closed" tag unions to combine in branches, this prevents the problematic case from previously:

stoplightColor =
    when car is
        Positive ->
            closedRed : [ Red ]
            closedRed = Red

            closedRed

        Negative -> Yellow  # This is actually now [Yellow].
        Equal -> Green  # This is actually now [Red].

# Since we allow composing these tag unions, the result is [Red, Yellow, Green] (without `*`),
# which communicates that we can only produce one of these tags.

But when using a tag union, we still need to indicate whether we have listed all tags exhaustively:

whatToDo = \color ->
    when color is
        Red -> "Stop!"
		Yellow -> "Hurry up!"
        Green -> "Go"
		
# Here we know all possible tags, so we infer:
# whatToDo : [Red, Green] -> Str

simpleRule = \color ->
    when color is
        Green -> "Go"
        _ -> "Don't go!"
		
# Here we can handle arbitrary tags, so we infer:
# simpleRule : [Green]a -> Str

Using the variable also allows correctly communicating the type of Richard's other example in response to my initial, half-baked suggestion of hiding the *:

blah = \fooOrBar ->
    when fooOrBar is
        Foo "" -> Bar
        Foo _ -> Foo "something"
        Bar -> Baz
        other -> other)
		
# blah : [Foo Str, Bar]a -> [Bar, Foo Str, Baz]a
# We know about certain tags, but the others are simply passed through,
# as indicated by `a`.

This notation feels much more natural to me, because the type variable represents some additional tags that Roc can't explicitely list.

Even for the analog of the problematic [Red] example, we might be able to provide guidance with a lint:

stoplightColor =
    when car is
        Positive ->
            closedRed : [ Red ]a
            closedRed = Red

            closedRed

        Negative -> Yellow  # This is actually now [Yellow].
        Equal -> Green  # This is actually now [Red].

# Since we can infer for the first branch that it is [Red],
# we can inform the user that they don't really not need the type variable `a`
# since they have already listed all the possible tags.

This actually seems to allow omitting the * from the language. It is useful when we would need to use a lot of different type variables. From my understanding test : [One]* -> [Two]* -> [Three]* desugars to test : [One]a -> [Two]b -> [Three]c. Since all values would now be inferred to be closed, we would need far fewer type variables. Only when we need to indicate that we will branch on a value do we need to indicate that using a type variable.

So overall, it appears that it might be possible to simplify Roc and omit * by inferring values as closed union tags and allowing composition of closed union tags. If I have overlooked a use case or made a mistake, please tell me in the "open and closed tag unions" thread on Zulip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment