This document is a draft. This information may be incomplete and/or inaccurate
After reading an article on Hacker News about Google's Fuchsia OS, I decided to take a look at its microkernel, Magenta.
I always had an interest on microkernels, and over the years I've read a lot of code from Mach variants, L4 family, MINIX 3, HelenOS, and so on. I even maintain a MkLinux (OSFMach+Linux) repo to preserve its code (https://github.com/slp/mkunity) and made minor contributions to GNU Mach and GNU Hurd.
I'm also curious about Google's approach with Magenta because writing a pure microkernel in the era of SSDs and 10 Gbps interfaces, is one the most challenging tasks you can find. To be able to extract the full potential of such devices, you need to both find the shortest possible path from user application to device (that is, keeping the lowest CPU induced latency) and being able to parallelize the IO load without compromises.
While microkernels may have an advantage on parallelizing the load, they tend to have (way) longer application to device paths. IO operations are usually served using RPCs requests between two or more user space applications. This implies multiple syscalls to the microkernel for message passing, with multiple context switches and their side effects (CPU cache pollution and multiple TLB flushes).
Magenta's build process creates a bzImage file, which includes both the kernel and bootfs, the latter being a custom, read-only filesystem with compresses files using LZ4.
TODO
After Magenta initializes both the platform and itself, it calls kernel/lib/userboot/userboot.cpp:userboot_init (registered as a hook with LK_INIT_HOOK with the lowest priority level). While it appears to be an external module, this is still statically linked code.
Then, the userboot module will:
- Create VMObjects for bootfs and ramdisk (if present).
- Create and map a user stack.
- Create and map a vDSO with libuserboot.so code.
- Collect handles from objects, putting them into a MessagePacket.
- Create a new user thread with an entry point inside libuserboot.so.
- Finish itself.
libuserboot.so is the first process to be executed in userland. Its code is bundled inside the kernel (see "kernel/lib/userboot/userboot-image.S"), but was mapped into userland as a vDSO, as described above.
As a comment in its source code says, it will:
- Read the kernel's bootstrap message.
- Load up the child process from ELF file(s) on the bootfs.
- Create the initial thread and allocate a stack for it.
- Load up a message pipe with the mx_proc_args_t message for the child.
- Start the child process running.
- Optionally, wait for it to exit and then shut down.
Unless otherwise specified via the kernel command line, this "child process" is "devmgr".
devmgr acts as the main namespace by creating a root memfs filesystem. Other filesystems and devices attach nodes to it.
devhost loads the drivers located at "/system/lib/driver" (drivers are dynamic libraries), and it acts as RPC server for every device found. This work can be done by a single process (ONLY_ONE_DEVHOST) or spawning one per PCI device (see "/system/udev/kpci/kpci.c").
While ThinFS is not part of Magenta, but included as a separate project inside Fuchsia, I find it quite interesting and worth mentioning here, as is a translator (yes, I'm using GNU Hurd terminology to reinforce the similarities) serving a trivial filesystem, written in Go.
This illustrates the ability of using Magenta's libraries and syscalls from Go. Also, is one of the best sources of documentation for such components, just after the code itself.
Magenta uses a port of musl and a library called mxio to implement some POSIX. And by some, I mean that fork() is not implemented (has always been a pain to implement this syscall on microkernels), and functions like getpid() or getuid() are stubs that return static values.
Full compatility with existing POSIX software doesn't look like a priority for Magenta.
Magenta's IPC implementation looks somewhat naïve. I can't find any of the usual optimizations found in Mach4 or L4Ka microkernels. Perhaps I'm overlooking something, but I expect it to perform quite poorly.
I'd like to confirm my suspicions about the poor IPC performance. I'm thinking about writing some simple benchmarks testing sequential and random I/O operations over a memory-backed filesystem and processes life cycles (mostly creation and destruction).
Magenta runs on the Raspberry Pi 3, and I have one lying around, so I'll probably use to execute those tests, comparing them against GNU/Linux, and perhaps some *BSD.
If you see this problem while running "fbuild", try moving the fuchsia directory to a shorter path. That is, if you have checked out fuchsia at "/home/user/sources/fuchsia", move it to "/home/user/fuchsia".