Skip to content

Instantly share code, notes, and snippets.

@marcan
marcan / linux.sh
Last active July 26, 2025 08:39
Linux kernel initialization, translated to bash
#!/boot/bzImage
# Linux kernel userspace initialization code, translated to bash
# (Minus floppy disk handling, because seriously, it's 2017.)
# Not 100% accurate, but gives you a good idea of how kernel init works
# GPLv2, Copyright 2017 Hector Martin <marcan@marcan.st>
# Based on Linux 4.10-rc2.
# Note: pretend chroot is a builtin and affects the current process
# Note: kernel actually uses major/minor device numbers instead of device name
The Mosh and Quake 3 Networking Models and State Synchronization Algebra
========================================================================
Mosh is a new remote shell program and protocol: https://mosh.org/
You may read technical details about its internals here:
https://www.usenix.org/system/files/login/articles/winstein.pdf
https://mosh.org/mosh-paper.pdf
NOTE: I'm not necessarily advocating this as the way to make your compiler go fast. Priority 1 is to make
full recompilation as fast as possible, ideally on the order of 100 MB/s per core with everything in memory.
Once you get it that fast, there's probably no need for incremental techniques, even if you want to
recompile entire files in real-time as the programmer types in code. But these incremental algorithms are
interesting computer science, so you should learn about them, and they are certainly applicable elsewhere.
After watching an interview with Anders Hejlsberg last year in which he mentioned the incremental
compilation techniques they used in their new C# and TypeScript compilers, I spent a bit of time studying
their open source code (and it's great that MS now makes this stuff available for people to peruse).

One thing that surprises newer programmers is that the older 8-bit microcomputers from the 70s and 80s were designed to run at the speed of random memory access to DRAM and ROM. The C64 was released in 1982 when I was born and its 6502 CPU ran at 1 MHz (give or take depending on NTSC vs PAL). It had a 2-stage pipelined design that was designed to overlap execution and instruction fetch for the current and next instruction. Cycle counting was simple to understand and master since it was based almost entirely on the number of memory accesses (1 cycle each), with a 1-cycle penalty for taken branches because of the pipelined instruction fetch for the next sequential instruction. So, the entire architecture was based on keeping the memory subsystem busy 100% of the time by issuing a read or write every cycle. One-byte instructions with no memory operands like INX still take the minimum 2 cycles per instruction and end up redundantly issuing the same memory request two cycles in a row.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
typedef struct ion_type_t ion_type_t;
typedef struct ion_value_t ion_value_t;
typedef struct ion_state_t ion_state_t;
typedef struct ion_instruction_t ion_instruction_t;
typedef void (*ion_operation_t)(ion_state_t *, ion_value_t, ion_value_t, ion_value_t *);
typedef struct { uint64_t b[4]; } uint4x64_t;
// Bitsliced 4-bit adder
uint4x64_t add_sliced(uint4x64_t x, uint4x64_t y) {
uint4x64_t s;
uint64_t c = 0;
for (int i = 0; i < 4; i++) {
uint64_t p = x.b[i] ^ y.b[i]
s.b[i] = p ^ c;
@androm3da
androm3da / build.sh
Created February 15, 2017 21:42
build clang
#!/bin/bash -ex
CC="clang"
CXX="clang++"
SRCTOP=$(readlink -f ${PWD})
INSTALL=${SRCTOP}/install
if [[ ! -d ${SRCTOP}/llvm ]]; then
echo Expected to find the source in ${SRCTOP}/llvm but it is missing
exit 3
@rygorous
rygorous / box_pruning_notes.txt
Created February 17, 2017 00:41
Note on changes to the box pruning code.
Brief explanation what I did to get the speed-up, and the thought process behind it.
The original code went:
EnterLoop:
movaps xmm3, xmmword ptr [edx+ecx*2] // Box1YZ
cmpnltps xmm3, xmm2
movmskps eax, xmm3
cmp eax, 0Ch
@jhaberstro
jhaberstro / gpu_arch_resources
Last active January 13, 2026 02:57
GPU Architecture Learning Resources
http://courses.cms.caltech.edu/cs179/
http://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
https://community.arm.com/graphics/b/blog
http://cdn.imgtec.com/sdk-documentation/PowerVR+Hardware.Architecture+Overview+for+Developers.pdf
http://cdn.imgtec.com/sdk-documentation/PowerVR+Series5.Architecture+Guide+for+Developers.pdf
https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/
https://www.imgtec.com/blog/the-dr-in-tbdr-deferred-rendering-in-rogue/
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-412605
https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
https://community.arm.com/graphics/b/documents/posts/moving-mobile-graphics#siggraph2015
#!/bin/bash -ex
CC="clang"
CXX="clang++"
export PATH=/local/mnt/workspace/install/binutils-2.27/bin:${PATH}
SRCTOP=$(readlink -f ${PWD})
INSTALL=${1-${SRCTOP}/install}
if [[ ! -d ${SRCTOP}/llvm ]]; then
echo Expected to find the source in ${SRCTOP}/llvm but it is missing