Coalton: Why is the interop not easier, and why might it be necessary for Coalton to be an entire language in itself?
Several blog posts have been written about Coalton, about how it can be useful and what it brings to the table. However, to me, it hasn’t been clear why Coalton is the way to solve the problems that it does solve. Isn’t a simpler solution possible without making a full embedded language, and giving users the cognitive overhead of thinking about interop between normal lisp and coalton?
I have been thinking about this for a while as one of my pasttimes, and below I’ll summarize the better reasons why coalton might be the way it is. Perhaps, I couldn’t see it because I didn’t know the ML type system, but then, this article is directed at exactly those readers who do not know the ML type system.
First, let’s summarize the problems with ANSI Common Lisp.
While compile time type checks may not catch all the errors, and are definitely not a replacement for a test-suite, they are nonetheless helpful for quick feedback and keeping the debug-rewrite loop short. ANSI Common Lisp provides no guarantees about this; SBCL does try hard to attempt it, but there will be cases where it fails.
In recent years, there is also cl-form-types that tries to use the CLTL2 environment API to achieve portable type inferencing. But this has to be put to use through macros and compiler-macros.
The Common Lisp Object System (CLOS) is fairly well-respected. However, perhaps to maintain compatibility with lisps existing at that time, the Common Lisp standardization process could not make all functions into generic functions.
Moreover, thanks to the lack of compile time type inferencing, and perhaps also due to their semantics, generic functions have to be necessarily dispatched dynamically. Again, there have been solutions proposed for this, albeit none of them portable beyond two to three implementations: fast-generic-functions, specialization-store, polymorphic-functions.
The above problems are quite well-known and many of us might have come across them. But, as someone without a background in ML-like languages, finding the below problems required a fair bit of digging and thinking to see things for myself.
Let us start with history. In 1989, How to make ad-hoc polymorphism less ad hoc (it seems to be an easy read, but we will discuss the idea below) was published by Walder and Blott. In 1994, ANSI Common Lisp came into existence. And while Common Lisp existed even in 1984, the CLTL1 published by Guy Steele does not suggest the existence of CLOS or generic functions.
Nonetheless, ANSI CL does make some effort to integrate types and classes:
The purpose of specifying that many of the standard type specifiers have a corresponding class is to enable users to write methods that discriminate on these types. Method selection requires that a class precedence list can be determined for each class.
The hierarchical relationships among the type specifiers are mirrored by relationships among the classes corresponding to those types.
However, it seems like tying classes to types in this manner has been a bad decision* in two related parts:
- Especially in the context of multiple inheritance, subtyping is not the same as subclassing.
- If generic functions are the equivalent of interfaces, then as the Walder and Blott paper suggests they should be independent of the classes of the arguments.
*I don’t intend to criticize the committee, it did what it did, and perhaps the best it could have done given the finite time and resources. The decision looks bad only in retrospect, because now we know better.
For a long time, I had believed that subtyping is the same as subclassing. And, this notion doesn’t seem uncommon, given that people regularly request for extensible sequences and custom hash tables amongst other things.
What defines a sequence? A functional notion would suggest the intuition that if an object s provided a way to map from a given natural number to another object, then that object s seems good enough to qualify as a sequence - no matter what the actual class of the object! So, it seems like a “type” can be defined in terms of the functions, methods, or whatever-callable it requires - almost like an interface - rather than something else.
Another example is described here:
Consider the data structure
deque
, a double-ended queue. Adeque
supports insertion and deletion at both ends, so it has four functionsinsert-front
,delete-front
,insert-rear
anddelete-rear
. If we use justinsert-rear
anddelete-front
we get a normalqueue
. On the other hand, if we use justinsert-front
anddelete-front
, we get astack
. In other words, we can implement queues and stacks in terms of deques, so as datatypes, Stack and Queue inherit from Deque. On the other hand, neither Stack not Queue are subtypes of Deque since they do not support all the functions provided by Deque. In fact, in this case, Deque is a subtype of both Stack and Queue!In general, therefore, subtyping and inheritance are orthogonal concepts. Since inheritance involves reuse of implementations, we could have an inheritance relationship between classes that are incomparable in the subtype relationship. In a more accurate sense, this is what holds between Stack, Queue and Deque. The type Stack supports functions push and pop that are implemented in terms of the Deque functions insert-front and delete-front. Similarly, Queue supporst functions insert and delete that are implemented in terms of the Deque functions insert-rear and delete-front. The function names are different, so none of these types is comparable in the subtype relation, but Stack and Queue do inherit from Deque.
Subclass-ing aka inheritance refers to implementation reuse, however subtype-ing refers to interfaces. Indeed, I’m not the first one to note this. Lisp Interface Library seems to be primarily built to separate these two notions of implementations and interfaces. Also see this article by Kent Pitman’s article.
As I understand, the ML family (and also Coalton) of languages calls these interfaces as “typeclasses”.
There might also be some confusion about the terms:
- The ML types are equivalent to classes in the above discussion.
- The ML typeclasses are equivalent to the types in the above disucssion, and may also be called as interfaces.
Once the difference between types/classes/implementation and typeclasses/types/interfaces is clear, it is possible to see that generics should depend on the interface details of their arguments and not their implementation details.
However, given that these insights were discovered after 1994 ANSI Common Lisp came into existence, they haven’t been incorporated into the language.
Parameterizing functions over the types of its arguments requires an equivalence class for its arguments. Given an integer 5
or a float 5.0
, ML-based languages can quite clearly state that their type aka int
and float
. The compilers have sensible defaults for inferring the types; and in some cases the language also provides users the option to override them.
Common Lisp has a hard time indicating if 5
is a type of integer
or fixnum
or (signed-byte 32)
or (unsigned-byte 16)
and so on and so forth. There are far too many possibilities. And even though CLHS might have some instructions about what should be done, the overspecified types pose an issue for parameterizing over them. Really, does a (simple-array single-float (2 3))
have the same type as (simple-array single-float (4 3))
? What about (simple-array single-float (12))
? Or (simple-array double-float)
? How about (array single-float (2 3))
? It’s not clear which of these should be considered equivalent! Indeed, the choice depends on the particular use at hand, but compromising with sensible defaults and lesser choices seems like a good option (?).
PS: I have come across the term “Principal Type”, but after a careful reading, it seemed like the above problem is slightly different from Principal Types, since the latter talks about the types being instances of the principal type, while the former is concerned about values being instances of the… canonical… type.
Coalton is an efficient, statically typed functional programming language that supercharges Common Lisp.
No, that’s not the point!
Coalton is an efficient, statically typed functional programming language, that provides clean canonical types, and also a principled distinction between interface and implementation, and thus supercharges Common Lisp.
That is the point!
Even then, there are reasons I have been reluctant to adopt Coalton:
- I value portability across implementations. This ensures not only that the code can run on different versions of the same compiler (assuming there are only incompatible changes due to CLHS’s ambiguities and not bugs), but also that it might be possible to keep using Coalton, and the libraries that depend on it, even when you write code for Android at some point in the future. Back when I was just starting to be pulled into the whirlpool of type systems and polymorphism, Coalton only seemed to support SBCL; thanks to the contributors, things seem to have improved more recently, and CCL also seems supported in 2023.
- I have a wish to be able to call coalton functions without monomorphizing them. Perhaps, this is a core reason why I haven’t been able to abandon polymorphic-functions. Calling coalton functions without monomorphizing can make them very similar to common lisp’s own generic functions in terms of calling conventions and interactivity! Then again, I don’t know what deeper issues this might lead to.
- Error messages should be understandable by people without a background in ML-like languages. Using coalton for your project means that if the user of your project runs into type errors, they will be greeted by an error message that they can only decipher if they already have a background in ML-like languages. That makes the barrier to entry to your own project even higher! Let’s not forget that Common Lisp and Emacs themselves have a high-enough barrier to entry, and add in ML-like language to the mix 👀.
Conclusion: It’s hard (if possible at all!) to solve all of the problems without creating a separate language.
We have Common Lisp with its excellent compiler SBCL, that performs static-time type checks as well as many optimizations using declarations. Until now, some of us thought that all things are rosy, and with some effort it should be possible to put that to “good use”, and lo-and-behold we can have a proper generic common lisp with type inferencing, without it being an embedded language like coalton. It is only after running into the above non-obvious problems that that solutions that do not provide a complete language in itself have begun to feel like dead-ends.
Hopefully now, no more people run into the dead-ends we ran into and instead adopt and contribute to Coalton.
I don’t have any specialized background in programming languages and I’m learning on the go. So, feel free to suggest any misconceptions or ambiguities I might be misleading the reader into!