jimmyfrasche/xr.md

Created August 18, 2017 01:31

Star (2) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jimmyfrasche/ba2b709cdc390585ba8c43c989797325.js"></script>
Save jimmyfrasche/ba2b709cdc390585ba8c43c989797325 to your computer and use it in GitHub Desktop.

Sum types experience report

Raw

Interfaces only allow you to model based on method set and they are, by design, open. Any type that satisfies the interface, satisfies the interface. They share some similarities to sum types, but, as far as they do, they are, essentially, infinite sums. While this often desired, there are times when you need to limit the options to a closed set of types.

There's no direct way in Go to say "these types, even though they share no methods in common". You have to use an interface{}, which says nothing—even though you know exactly what you want to say. You have to handle an invalid case at runtime.

There's no direct way to specify only the types in this package. Using an interface with an unexported "tag" method gets you part of the way. There's still nil and embedding and, at every point, those need to be dealt with—or, all too often, ignored.

These cases can lead to trivial errors that could be caught by the compiler, but instead need to be handled by defensive coding and extensive testing. Defensive coding and extensive testing will always be needed, of course, but it would be better if that effort could be focused on more pertinent, higher level concerns instead of having to buttress against an inability of the type system to understand "I can only handle these specific types". The larger a project gets the more opportunity is for the invariants to be violated and it is too easy to violate them.

Without a way to define a closed set of types, we end up doing the type checker's job whenever the need arises. Linting can help somewhat. I do not believe it can be done with 100% accuracy except in simple cases and there are enough variations within the "closed set of types" space to require multiple different kinds of linters.

The unnecessary boxing can increase allocations and be an optimization fence, leading to inefficient data structures and algorithms. With a way to model a closed set of types, the compiler has more information to work with because it has fewer possibilities to consider.

The lack wastes programmer time and machine spacetime.

These issues do not come up at every turn—unless you're writing a compiler/interpreter where they are pretty much the only turn. But they do still come up, and, when they do, there are no great options in the language.

When dealing with an FFI, there are often times when you need to specify "only these types" and that can be outside of your control. Similar situations can arise with serialization: json.Token being an example.

One of the more vexing issues I run into, outside scenarios like the above, is needing to have multiple channels in a select when it would be cleaner to have one or two with sum types representing the set of allowable messages and their respective disjoint values. That would better model the structure of the communication and I could only return one or two channels from the code that started the server instead of a handful.

Being able to say "every type like this" is extremely useful in general, but sometimes you need to be able to say "only these types".

mewmew commented Aug 21, 2017 •

edited

Loading

For a specific use case, while developing a Go library for interacting with LLVM IR (https://github.com/llir/llvm) several work-arounds had to be used since the notion of only these types was lacking in Go.

For instance, LLVM IR instructions were modelled using an ir.Instruction interface, which while functional (the instructions could easily be modelled using an every type like this representation) comes with a few drawbacks.

Firstly, it requires that the different set of instructions implementing the interface be documented, rather than inferred from code. Extract from http://godoc.org/github.com/llir/llvm/ir#Instruction

type Instruction interface {
    fmt.Stringer
    // GetParent returns the parent basic block of the instruction.
    GetParent() *BasicBlock
    // SetParent sets the parent basic block of the instruction.
    SetParent(parent *BasicBlock)
}
// Instruction may have one of the following underlying types. 
//
// Binary instructions
// *ir.InstAdd    (https://godoc.org/github.com/llir/llvm/ir#InstAdd)
// *ir.InstFAdd   (https://godoc.org/github.com/llir/llvm/ir#InstFAdd)
// *ir.InstSub    (https://godoc.org/github.com/llir/llvm/ir#InstSub)
...
// Bitwise instructions
// *ir.InstShl    (https://godoc.org/github.com/llir/llvm/ir#InstShl)
// *ir.InstLShr   (https://godoc.org/github.com/llir/llvm/ir#InstLShr)
...
// etc, ... In total 45 instructions were documented in this way.

Note, a common use case is to use type switches to perform type assertions on values implementing ir.Instruction:

func f(inst ir.Instruction)
   switch inst := inst.(type) {
   case *ir.InstAdd:
   case *ir.InstFAdd:
   // ...
   }
}

For this reason, it becomes paramount to keep this documentation of instructions implementing ir.Instruction up to date, to be usable by users. When the documentation is not kept in sync with code, users may miss adding a case to handle a specific type of instructions (e.g. *ir.InstTrunc); thus resulting in incorrect interpretations of the LLVM IR input or run-time panics.

This brings us to another other issue cased by using interfaces, when wanting to represent only these types. Often, dummy methods have to be added to interface definitions, to mimick sum types; as was the case in the representation of LLVM IR constants

type Constant interface {
    value.Value
    // Immutable ensures that only constants can be assigned to the
    // constant.Constant interface.
    Immutable()
}

The Immutable method is a dummy method used to distinguish a set of types as constants, without having any method distinct from the value.Value interface to separate constant values form non-constant values. Needless to say, treating a constant, as a non-constant would be incorrect and the API should this prevent such use cases.

The problem with the Immutable dummy method, is that - since it is exported - it makes it impossible to successfully lint a source file to ensure that each case in a type switch have been implemented; i.e. the linter method described by Jimmy Frasché falls short here as it fails to identify all types implementing the constant.Constant interface, and thus requires users to rely on (potentially incorrect, or not-up to date) documentation to handle each type.

This clarify that this is not a hypothetical issue, but one that do arise, the type switches in the implementation of an LLVM IR traversal walk function have often missed handling types which do implement these work-around set type representations (e.g. llir/llvm@a1472af); which have caused run-time panics to occur only when the LLVM IR input assembly manage to trigger that specific code path.

Note, an issue I haven't even touched upon yet, is when a library is updated to include new types which implement the work-around representation of set types, as those type additions are likely to go unnoticed by users of the library until they run into run-time panics or incorrect interpretation of input by not handling all instructions. This too, is an issue which is not hypothetical, but rather likely to happen. In the case of the LLVM IR library for Go, when the LLVM project adds new instructions to the LLVM IR language, those instructions will be modelled in the LLVM IR library, and users of the library are very likely to miss handling those added instructions in some part of their code; as the compiler does not help to enforce the only these types representation, and that linters falls short because it is impossible to catch all sum type implementations, when using exported dummy methods. And, to keep the package boundaries, it may not always be possible to unexport those dummy methods.

To sum up, not being able to represent only these types leads to work-arounds which introduce issues (such as run-time panics) that at times cannot be caught by linters, and which put extended pressure on the documentation to always be kept up to date, and forces users to do grunt work which the compiler should be able to relieves them from, given a type system which understands the notion of "I can only handle these specific types".

Author

jimmyfrasche commented Aug 25, 2017

I also wrote a related semi-experience report about how default in a type switch interacts with sum types simulated by interfaces.

awalterschulze commented Sep 19, 2017

Feel free to delete this comment.

If we are going to talk about sum types shouldn't we also mention that every function returns a value AND an error, instead of returning a value OR an error, which is what the user is trying to do?

So currently we have:

func ParseInt(s string, base int, bitSize int) (i int64, err error) {
   ...
  if len(s) == 0 {
  	return 0, syntaxError(fnParseInt, s)
  }
  ...
}

Instead of

func ParseInt(s string, base int, bitSize int) (i int64 | err error) {
   ...
  if len(s) == 0 {
  	return err syntaxError(fnParseInt, s)
  }
  ...
}

or some other version of sumtypes.

I wish someone would run an experience report on this.

Author

jimmyfrasche commented Sep 19, 2017

@awalterschulze using sum types for errors has come up in golang/go#21161 (comment) and golang/go#19412 (comment) (that I know of/remember off the top of my heads) and further discussion could go in either issue

awalterschulze commented Oct 15, 2017

@jimmyfrasche I did not get a notification on your last comment, so I am sorry for this late reply.
I am glad to hear that it has come up before, but I still think its important to add it to the experience reports for Go 2 considerations, so I have added my own experience report https://gist.github.com/awalterschulze/ceb5d695e6e116234dff5079c3a4f7e3
which means you are totally free to delete this discussion.
Thank you for pushing sum types.

Author

jimmyfrasche commented Oct 16, 2017

Gists don't do notifications. It's very annoying. I'll leave the discussion because there are some good links.

awalterschulze commented Nov 8, 2017

Yes very annoying :)

jimmyfrasche/xr.md

mewmew commented Aug 21, 2017 • edited Loading

jimmyfrasche commented Aug 25, 2017

awalterschulze commented Sep 19, 2017

jimmyfrasche commented Sep 19, 2017

awalterschulze commented Oct 15, 2017

jimmyfrasche commented Oct 16, 2017

awalterschulze commented Nov 8, 2017

mewmew commented Aug 21, 2017 •

edited

Loading