Skip to content

Instantly share code, notes, and snippets.

@andrewrk
Created October 17, 2020 23:26
Show Gist options
  • Save andrewrk/a7fdf3a8a75e65119312b824a3f28626 to your computer and use it in GitHub Desktop.
Save andrewrk/a7fdf3a8a75e65119312b824a3f28626 to your computer and use it in GitHub Desktop.
Eleanor's proposal

Absolute Zero

Bootstrapping Zig from below C level.

Motivation

The first computer programs were written directly in machine code. To make things easier, automated assemblers were developed, initially in machine code as well and then in their own input language. To make things easier again, high-level (for the time) languages such as BCPL, B, and eventually C were developed, which abstracted some of the details of the underlying architecture and allowed more complex compound logic. These languages were implemented by first writing compilers for subsets of them in assembly, then using those compilers to compile compilers for the full languages. Thus, there was a concrete bootstrap chain extending right down to assembly.

Nowadays, this is a lost art. Quite literally all languages that bootstrap themselves nowadays start from a pre-existing high-level language, usually C or C++, and then reject their roots as soon as they can. Zig stands out by maintaining its bootstrap chain, but this chain begins at a system C++ compiler, and currently there are no plans to build on firmer ground than a system C compiler. Granted, this is by no means a bad choice -- almost every platform is targeted by several C compilers, and it means we can have a single unified codebase for at least the first stage of bootstrapping. However, it also means we are forced into lockstep with C compiler standards (not to mention we have to assume that the system compiler actually follows these), and that we can't implement Zig on a completely unsupported platform, such as a personal project architecture.

We could mitigate these problems, and gain some serious nerd cred, if we could pull ourselves up from bare metal. This could be done by defining a strict subset of Zig (herein referred to as tZig, for "thin Zig", i.e. a thin layer over assembly), which is easier to translate to any machine code, then writing the stage 1 compiler in that subset, such that any language could be used for bootstrapping. This is more complex internally than the current plan, but it does not create more work for the end user -- in fact, if a platform does not have a C compiler for whatever reason, it is much less work to implement a tZig compiler than a full C compiler, especially if you only have assembly to work with. If a platform does have a C compiler, we can simply write a tZig compiler in C, and we don't even have to distribute this with the main Zig distribution.

Restrictions

  • No forward references -- global declarations may only reference previously defined symbols
  • No comptime -- only global constants are comptime-known, and see below
  • No type or error set inference
  • Declarations must not contain ALU ops, function calls, dereferences, or indexes
  • Non-primitive types must be bound to a global constant before use
    • No union (enum) -- enum type must be explicitly declared
  • Functions must be bound to a global constant before being called or referenced (once #1717 comes through)
  • No member functions or types
  • Local variables must all be declared at the start of a function
  • No compound operations -- at most (one assignment and (one ALU op/function call/index/dereference OR one complex literal (array, struct, string))) OR one switch/if statement per line/switch arm/if clause
    • In particular, this means that switch and if statements cannot return values -- extracting values must be done through assignment
  • No defer or errdefer
  • No floating point types or values
  • No for loops -- while loops only, with optional continue expressions, subject to the previously stated statement restrictions
  • No async (this one is definitely negotiable)
  • No inline assembly (this makes code non-portable without comptime switches, and requires an integrated assembler -- see below for an alternative)
  • Only allowed import is the special-case @import("bootstrap"), which provides hooks into the implementation for heap allocation, file I/O, and other necessary functionality
    • When compiling tZig as Zig, this aliases selected standard library functionality
  • No builtin functions that would contradict previous rules

These restrictions will allow the language to be compiled line by line in one pass, without the need for complex register allocation, and maintaining full compatibility with the final stage compiler.

Notes:

  • The tight restrictions on single statements were chosen such that a typical RISC system can compile them into single instructions.
  • An index/dereference may appear on either the left or the right of an assignment, but there must be only one.
  • defer/errdefer require a compiler to know a future address, which may depend on future statements -- thus, implementing them with a single pass is impossible. (I could be wrong about that. If so, they should be allowed.)
  • In contrast, try only requires knowledge of a return address, which typically has a designated register.
  • Floating point values are not available on all platforms, and not necessary for a compiler.
  • Any for loop can be implemented as a while loop, and for loops have hidden state and functionality (namely, the current index and bounds check).
  • async is a deliberate departure from platform ABIs, hence it might not be obvious how to implement it. However, if it is particularly helpful, it can be accommodated.
  • Inline assembly makes code non-portable without comptime switches, and requires an integrated assembler, hurting simplicity.
  • Not only is @import("bootstrap") ugly, it's an exception to both the declaration rule and the compound operation rule. However, at some point we do need to hook into the host system, and without inline assembly, the best we can do is put all the magic in one place. Also considered was placing a single bootstrap module in std so as to avoid another @import special case -- however, that would have required even more special casing on the tZig side, as well as necessitating lazy evaluation (and therefore the full complexity of Zig syntax) to avoid processing the whole standard library.
  • I have definitely missed details. Most of this was written in 2 hours, and the rest at midnight. I might do a proper proposal tomorrow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment