Then mathematical neatness became a goal and led to pruning some features from the core of the language.
— John McCarthy, History of Lisp
If you prefer programming languages with a tidy and tiny core, you're in for a treat. This article drills down to the singleton primitive at the heart of Raku's metamodel ("model of a model") of a model of computation ("how units of computations, memories, and communications are organized")1.
It began with u/faiface's reddit post/thread "I'm impressed with Raku"2. One of their sentences in particular stood out for me:
I still generally prefer languages with a small, orthogonal core
I took this to mean they thought Raku did not have a small, orthogonal core. But it does. I wrote a personal response to them to comment on that, and that comment morphed into this article.3
I will start by touching on Raku as it is if you start writing a Raku program, with its standard bells and whistles in place, and then systematically drill down to Raku's tiny core. Seven sections in all:
-
7. What is Raku's
CORE
? Raku'sCORE
is a symbol table containing functions likeprint
and operators like+
. We'll start with this outerCORE
, but it is not the tiny core we seek. -
6. What language is Raku's
CORE
written in? Raku isn't a single language. Instead it's a mutable "braid" of mutually embedded languages (each being a "sub-language" or "sibling language" if you will). But there's no fixed language in Raku; the only thing that's constant is a single semantic model shared by all these languages. -
5. What "single semantic model" does Raku target? I touch on some answers, ending with one that leads us to drill down through the lower layers of Raku, and down through Rakudo, the Raku compiler, which is entirely written in code that's compatible with the single semantic model.
-
4. What is the HLL language that implements the single semantic model? Rakudo is built atop a subset of Raku named nqp. nqp is a language for writing compilers. The nqp compiler, which is itself written in nqp, targets NQP, an abstract VM that targets various concrete backends. NQP is also written in nqp, so to get to the inner core we must drill down another level, focusing on a concrete NQP backend.
-
3. What is the low level backend that implements the single semantic model "on the metal"? MoarVM, short for "Metamodel on a runtime VM", is Rakudo's concrete backend for use in production settings. It implements Raku's single semantic model directly "on the metal" as it were, via a tiny singleton data structure called
KnowHOW
. -
2. What's
KnowHOW
?KnowHOW
is a self-describing primitive that knows how to be and do just one thing: to be the soleKnowHOW
from which both itself and all else can be bootstrapped, starting with implementation of the single semantic model. -
1. Have we truly arrived at Raku's core? Well, the ultimate core could be said to be the hardware it runs on, or even the physical universe. But suffice to say,
KnowHOW
is the tiny core this article drills down to.
Like any PL (programming language) and its implementations, there are cores within cores within cores until you hit electrons. Let's start by identifying Raku's equivalent of a standard library. We can safely presume the core we seek is more primitive than the standard library.4
Consider this four line Raku program:
print 42 + 99; # 141
print &print.file ; # ...src/core.c/io_operators.rakumod
print &infix:<+>.file; # ...src/core.c/Numeric.rakumod
print ?CORE::<&print>; # True
The first line demonstrates that print
can be called without any explicit import, and +
works without any import too. The second and third lines call the .file
method on the symbols &print
and &infix:<+>
to reveal the source code corresponding to the print
function call and +
operator.5 The final line shows that the &print
symbol is stored in a symbol table named CORE
(as is &infix:<+>
).6
Almost all of Raku's features are technically "user-defined", including all of its CORE
. These features are not the core we seek. We seek something more primitive. To move toward that we next look at the language in which the CORE
code is written.
Informally speaking, Raku's CORE
is written in a simpler "core" language. But here's where things start to get a bit trickier to explain.
This simpler "core" language is actually a braid of even simpler sub-languages aka "slangs". Slangs are DSLs (Domain Specific Languages) that work together to collectively comprise a GPL (General Purpose Language). The GPL can be rich yet it is built up from relatively small parts.7 (Where by "small" I mean relative to the "core" language they are each a piece of, and very small compared to the surface CORE
that's built atop the "core" language.)
The "standard" braid (that ships with a Raku compiler) currently includes a half dozen slangs (a smaller GPL part of the overall GPL, plus DSLs for strings, regexes, embedded documentation, and so on) that mutually embed each other.
User code can replace, alter, add or remove individual syntax rules or semantics of any slang, and thus the "core" language. Devs can go further, adding new slangs of their own.
Or even removing slangs. A thought experiment is considering what happens if user code removes all of the slangs in the braid. Doing that wouldn't be a very useful practical thing to do -- after the last slang/syntax is removed, any program with any further code after that (even just whitespace or comments) would fail to compile. But the thought experiment leads to an interesting question: Where's the core in a Ship of Theseus that can completely vanish en-voyage?
Larry Wall, Raku's lead designer, wrote in his 2001 Apocalypse #1:
Raku will support multiple syntaxes that map onto a single semantic model.
These syntaxes are entirely arbitrary and mutable. Thus Raku's real inner core isn't anything to do with syntax. Instead, it's something to do with Raku's "single semantic model".
As part of writing this article I asked "What is the “semantic model” introduced in Apocalypse #1?" on StackOverflow.8 The answer by jnthn, the lead dev of the Rakudo compiler toolchain covered a range of options, all of which are interesting.
This option sounded enticing:
We could see RakuAST as an alternative syntax for Raku expressed as an object graph. Given it will also be the representation the compiler frontend uses for Raku code, we can also see it as a kind of syntax-independent gateway to the Raku semantic model.
This is why jnthn has written "RakuAST will be found at the very heart of Rakudo". But the truth is, it will still only be a "gateway" to what we seek (albeit a "syntax-independent" one).
Instead, to continue our journey to the inner core, we'll go with another of jnthn's options:
An [interpreter or compiler] implemented in some other language (in which case we lean on its semantic model)
Most of Raku, and the Raku compiler Rakudo, is written in Raku. But it's bootstrapped9 from lower levels. And the next level down is nqp.
One could say that Raku implements it because it follows that model's API. But Raku does so partly in pure Raku and partly in another HLL named nqp that is:
-
A subset of Raku. nqp is the middle "doll" in Rakudo's stack of three self-similar systems.10
-
A programming language / system focused on constructing and compiling programming languages.11
The nqp compiler and its standard libraries are written in nqp, so focusing on nqp isn't really going to help progress toward the inner core of things. So instead we next look at what nqp targets: NQP, an abstract VM that runs nqp (and Raku). But NQP is almost entirely written in... nqp!
Have we run out of road? No, we just need to figure out how nqp/NQP maps to "the metal", to machine code running on hardware.
The final steps in our journey to the center of Raku are clearly marked on the map Larry Wall was sketching out back in 2001. Immediately following the important first sentence I quoted above, and repeat below, he wrote a second even more important sentence:
First, Raku will support multiple syntaxes that map onto a single semantic model. Second, that single semantic model will in turn map to multiple platforms.
So, to continue our journey, we recognize that running Rakudo, or the nqp/NQP subset of Raku(do), means running with a selected backend appropriate for a given platform. It's one of these backends that's actually running code on an underlying platform, so we need to look at one of these backends to see Raku's real core, where Raku's ultimate underlying semantics meets "the metal".
Rakudo/nqp/NQP experimentally just about support JVM and JS backends, but we're going to focus on the only backend that's currently production status as well as running on a wide range of OS/hardware combinations: MoarVM12.
Near the start I wrote:
This article drills down to the singleton primitive at the heart of Raku's metamodel ("model of a model") of a model of computation ("how units of computations, memories, and communications are organized")1.
Raku's metamodel -- its model of Raku's single semantic model -- is known as 6model. It can be built atop existing platforms (and is, for the JVM and JS backends), but it can also be implemented directly on the "metal". And MoarVM -- Metamodel on a runtime VM -- does just that, in C.
For each target platform, 6model is bootstrapped from a single data structure with associated code. On MoarVM the data structure is a C struct declared in about 30 lines of C code.
Saying it's 30 lines of code is cheating in the sense that this struct makes use of other declarations and setup code. But I think it's fair to say it's pretty small. And it's definitely the core primitive; the entirety of Raku is bootstrapped from this one data structure, by creating copies of it with different initial values, and fanning messages out to the copies, which themselves create more copies of themselves, and so on, in an ever widening system.
This computational primitive1, a singleton "self-describing" datum that combines data/state and code/behaviour, is named KnowHOW
.
Let's momentarily zoom out to the 30,000 foot view and then, with all the setup done so far in this nearly finished article, rapidly drill back down.
We can zoom out and then back in with these four lines of Raku code:
This Raku code... | Displays name of... | Which is... | As computed by... |
---|---|---|---|
say 42.^name |
42's WHAT object | Int |
Raku code, which calls... |
say 42.HOW.^name |
Int's HOW object | Raku::Metamodel::ClassHOW |
nqp code, which calls... |
say 42.HOW.HOW.^name |
Raku::Metamodel::ClassHOW's HOW object |
NQPClassHOW |
nqp code, which calls... |
say 42.HOW.HOW.HOW.^name |
The core primitive | KnowHOW |
backend code |
-
42
is just a random Raku value I chose as a starting point. The drill down through the layers will arrive at the same core primitive regardless of whether we started with anint32
value, an exception, type object, operator, function, keyword, whatever. WHAT is Raku's macro/method for returning a value's corresponding type object. -
42.HOW
returns a How Objects Work object (aka "metaobject").42.HOW
knows how objects of Raku'sInt
class work. The name of this HOW isRaku::Metamodel::ClassHOW
. (If I'd chosen, say, asubset
as the value --subset foo
-- the HOW would have beenRaku::Metamodel::SubsetHOW
.) HOWs can live in ordinary Raku userspace, but all HOWs shipped with the Raku compiler Rakudo are written in nqp. (While they look like Raku code, they're written in a subset of full Raku, stored in files with a.nqp
file extension, and compiled directly by thenqp
compiler.) So thisRaku::Metamodel::ClassHOW
is an instance of an nqp class that implements the mechanics of a Raku class in general, abstracted from the specifics of any particular Raku class. -
42.HOW.HOW
is also an nqp object -- we're closing in on Raku's core and are now deep below itsCORE
, in code that's unaware of (full) Raku. (Though note that Raku and nqp remain 100% compatible due to them sharing the same metamodel.)42.HOW.HOW
is namedNQPClassHOW
. It knows how the nqp instance namedRaku::Metamodel::ClassHOW
works. It implements, in nqp, the mechanics of an nqp class in general, abstracted from the specifics of any particular nqp class. -
42.HOW.HOW.HOW
isKnowHOW
. This is Raku's core primitive. If Raku code is being run with the MoarVM backend,42.HOW.HOW.HOW
returns a value that is MoarVM's C struct and associated code implementation of 6model ("How Objects Work") that knows how MoarVM's C struct and associated code implementation of 6model works. (The self-referential nature of this description isn't an error.) Thus, for example, the MoarVMKnowHOW
implements, in C, calling a function associated with a 6model data structure (aka a "message" or "method", though whether a "message" or "method" exists at this bootstrap stage is like asking which came first, the chicken or the egg).
In one respect, no. Because this is just C code that targets underlying hardware.
But for the purposes of this article, we have the following satisfying result:
say 42.HOW.HOW.HOW.HOW.^name ; # KnowHOW
say 42.HOW.HOW.HOW.HOW.HOW.^name ; # KnowHOW
say 42.HOW.HOW.HOW.HOW.HOW.HOW.^name ; # KnowHOW
...
That is to say, calling .HOW
on 42.HOW.HOW.HOW
returns its invocant, i.e. itself, i.e. this innermost KnowHOW
knows how it itself works. In a self-similar fashion, it includes a slot declaring its type, and another declaring its type's constructor's kind, and both these slots point to itself. The upshot is that code that it includes for calling a function on a metamodel object, or accessing a slot's data, can be applied to itself.
So this ultimate KnowHOW
-- the abstract conceptual singleton heart of 6model -- is Raku's core primitive, and there's a concrete implementation of it in each backend.
1 Of note, this primitive is Actor model "consistent", by which I mean it bundles behavior (code) and 100% private state (data), and cleanly enforces that consistency at every level, from the VMs Raku runs on to languages that target those VMs including Raku itself. One upshot is that the OO::Actors
userland module is (at the time of writing this footnote) just 35 lines of code.
2 The 13th highest ever scoring post in /r/programminglanguages at the time it was posted.
3 My original version of this article began: "Me too from many standpoints including: initial attraction and comfort zone; instinctual sense of formal aesthetics; deep debugging of a thorny problem; modifying a language; working on its compiler; and discussing a language's bowels with folk pointing out they prefer a small core. :)" But I didn't elaborate on any of those notions. Attraction, comfort, and aesthetics are too subjective. Debugging and working on a compiler are more tractable but I skipped those topics too. One thing I did write but have now elided from the body of the article, but want to keep here in this footnote for posterity, is a question I asked about PLs that reflects what I view as nice small FP cores: "Which is more your type of poison, Kernel or Frank?" (Kernel's author John Shutt sadly passed away earlier this year (2021) but I hope to see vau rise again.)
4 The "batteries included" distribution offered to newbies is "Rakudo Star". It includes the Rakudo compiler package plus additional docs, tools, and a collection of useful libraries. I ignore those libraries. The Rakudo compiler package includes a few libraries that have to be explicitly imported if their features are to be used. For example, the Test
module requires you write use Test;
to use its features. I ignore those too.
5 The .c
in core.c
stands for Raku Christmas, the first version of Raku released on Christmas Day 2015. The second major version, Raku Diwali, was released in November 2018, and there's a corresponding core.d
folder. Modules in core.d
are lexically concatenated with modules in core.c
to form the pre-populated lexical scope ("setting") of Rakudo Diwali programs that is accessible via the symbol CORE
. cf Haskell's Prelude (though Raku's setting is broken into two parts -- prologue and epilogue -- that form a sandwich with user code in the middle).
6 CORE
isn't just for conventional functions, but operators too. Consider this one line Raku program that adds a factorial operator to the language:
sub postfix:<!> (Int \n where * > 0) { n == 1 ?? 1 !! n * (n-1)! }
^ ^
This postfix operator is added at the time the compiler parses the >
(indicated with the first ^
), so that it's immediately available in subsequent code (note how it's used in the operator definition body, indicated by the second ^
). Click this tio.run link to see the above code fail. Note how the first line in the tio code (say 'program starts to run';
) does not execute despite being the first line. This is because compilation fails -- the postfix !
is not yet part of the language when it's used by the say 5!;
line. Next, cut that first line, paste it as the last line instead, and click the run/play button again; the code now successfully compiles and executes. This ability to extend the language within the language is used to ship a CORE
that's full of functions like print
and operators like +
.
7 Raku is a general purpose language-oriented programming language. This is very useful in its own right, but in addition, Raku includes slangs that make it easy to apply language-oriented programming to the interesting problem of constructing a language. Not only that, but it includes those slangs in such a way that it's easy to apply language-oriented programming to the problem of constructing a language that makes it easy to apply language-oriented programming to the problem of making it easy to...
8 Only my 3rd ever SO question -- unlike my 300+ raku answers. In case it wasn't already obvious, I 💓 Raku. :)
9 Raku(do) is bootstrapped in several ways. ("Bootstrapping" is defined by Wikipedia as "a self-starting process that is supposed to proceed without external input".) This includes Raku culture itself, as seen in its conception and gestation, as well as aspects of:
-
Linguistic bootstrapping. Raku's design aims at the full range of programming language facility from a young child's acquisition via "baby steps" to those creating their own language extensions, DSLs, or entirely new languages, and otherwise using Raku for advanced programming.
-
Compiler bootstrapping. Rakudo is bootstrapped in several ways. At the outer level there's
CORE
. As already explained, this isn't the starting point, and neither is the Raku language, or rather collection of sub-languages of which it's comprised. Rakudo compiles Raku before a user's program is compiled, and it does that via nqp/NQP10, which is a bootstrapping compiler. It goes further than that too, but I'm getting ahead of the story.
10 nqp is a subset of Raku. (The name "nqp" is short for "not quite paradise".13) It has the same braided architecture as Raku, but drops some of the sub-languages that Raku has. While its grammar (parsing) sub-language is a large chunk of Raku's (which inherits from it), nqp's other sub-languages are much smaller. nqp's equivalent of Raku's standard library is also tiny in comparison to Raku's. Similarly, NQP's concrete backends implement a subset of Raku that corresponds to A) the single semantic model and B) features that are best implemented at a low level for performance reasons.
11 For those interested in technical arcana, nqp is a bootstrapping, self-hosted, meta-compiler, a modern day retelling of META II.
12 An Erlang/Elixir/BEAM enthusiast wrote a fairly popular brief intro to MoarVM. They described it as "a fantastic piece of technology".
13 Actually, that's a Jedi mind trick. (Some Rakoons seriously hated my trick when they thought they saw it. Which I had anticipated. Which is why I did the rename as a Jedi mind trick.)
Thanks for following up. I was a perl-curious person back in 2010 while performing some consulting work for Marvell Semiconductor (I was prototyping some web applications and porting things to their armv5 platforms). As the workload grew, I opted to switch my work from Perl/Plack/Tatsumaki to something that could be done, in my estimation, faster with node.js (and I could easily find JS devs at the time) and some home grown frameworks. Now some years on I am revisiting the old code and realize what an effort it is to maintain, although node.js has grown in maturity, for better I think.
That said, I yearn a bit for a platform that has multiple paradigms available, flexible compiler options, resource friendly, portable, modern, secure, with some base level glue to make it possible to build a variety of apps/services quickly without the headaches of opinionated frameworks, monocultures within the community, etc. I tried D and I thought this was the answer, as it has a lot of positives. The compiler flexibility is great (gcc, llvm, etc.), C syntax and easy FFI to existing C code, OOP (everything one wishes C++ had and they had it for a decade easily), GC, etc. But the community was really struggling against itself, and also there was a lack of modern well maintained libraries. No big compnay using it.
Raku I came across recently, some HN post. I actually didn't realize it was Perl until I looked at some source code and saw the obvious fingerprints of Perl. So it's great to see the work that has been going on. To answer your question, the backends sound very interesting. I would like to know a bit more about porting, if I wanted to port to a linux/RISC-V configuration, what does the path look like. Porting V8 (upon which node.js sits) wasn't terrible, and I was able to do it without having to bootstrap anything.
A second set of question I have are around the resource footprint of the runtime, especially in the context of squeezing this onto smaller devices. I haven't done this analysis yet.