Skip to content

Instantly share code, notes, and snippets.

// Linear-scan mark-compact collector for causal data structures, i.e. the reference graph is acyclic and the objects
// are ordered by age in the heap (which happens if you use a linear allocator) so they are automatically topologically sorted.
// This algorithm has very high memory-level parallelism which can be exploited by modern out-of-order processors, and should
// be memory throughput limited.
void collect(void) {
// Initialize marks from roots.
memset(marks, 0, num_nodes * sizeof(uint32_t));
int newest = 0, oldest = num_nodes;
for (int i = 0; i < num_roots; i++) {
marks[roots[i]] = 1;
struct Object {
Key key; // The key is any piece of data that uniquely identifies the object.
// ...
};
struct Handle {
Key key;
Index index; // This caches a speculative table index for an object with the corresponding key.
};
ValueRef add(ValueRef left, ValueRef right) {
// Canonical commutation: x + y => y + x (constants have negative pos and move to the left)
if (left.pos > right.pos) SWAP(left, right);
switch (left.op) {
case NEG:
// Strength reduction: (-x) + y => y - x
return sub(right, getleft(left));
case NUM:
// Strength reduction: 0 + x => x
if (getnum(left) == 0) return right;
// Heavily based on ideas from https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c
// The most fundamental deviation is that I eschew the big hash table and the lj_opt_fold()
// trampoline for direct tail calls. The biggest problem with a trampoline is that you lose
// the control flow context. Another problem is that there's too much short-term round-tripping
// of data through memory. It's also easier to do ad-hoc sharing between rules with my approach.
// From what I can tell, it also isn't possible to do general reassociation with LJ's fold engine
// since that requires non-tail recursion, so LJ does cases like (x + n1) + n2 => x + (n1 + n2)
// but not (x + n1) + (y + n2) => x + (y + (n1 + n2)) which is common in address generation. The
// code below has some not-so-obvious micro-optimizations for register passing and calling conventions,
// e.g. the unary_cse/binary_cse parameter order, the use of long fields in ValueRef.
# Here's a probably-not-new data structure I discovered after implementing weight-balanced trees with
# amortized subtree rebuilds (e.g. http://jeffe.cs.illinois.edu/teaching/algorithms/notes/10-scapegoat-splay.pdf)
# and realizing it was silly to turn a subtree into a sorted array with an in-order traversal only as an
# aid in constructing a balanced subtree in linear time. Why not replace the subtree by the sorted array and
# use binary search when hitting that leaf array? Then you'd defer any splitting of the array until inserts and
# deletes hit that leaf array. Only in hindsight did I realize this is similar to the rope data structure for strings.
# Unlike ropes, it's a key-value search tree for arbitrary ordered keys.
#
# The main attraction of this data structure is its simplicity (on par with a standard weight-balanced tree) and that it
# coalesces subtrees into contiguous arrays, which reduces memory overhead and boosts the performance of in-order traversals
typedef enum {
CMD_INT = NODE_INT,
CMD_NEG = NODE_NEG,
CMD_NOT = NODE_NOT,
CMD_ADD = NODE_ADD,
CMD_SUB = NODE_SUB,
CMD_RET = 128,
CMD_SETREF,
CMD_GETREF,
} Cmd;

Partial redundancy elimination (PRE) is an optimization where you eliminate computations that are redundant on some but not all control flow paths.

Example:

if (...) {
    // ...
    int a = x + y;
    // ...
} else {

// ...

void gen_mul(Value *dest, Value *src) {
if (isconst(dest)) {
gen_swap(dest, src); // this doesn't generate instructions and just swaps descriptors ("value renaming")
}
val_to_reg(dest); // the post-condition is that dest is allocated to a register
if (isconst(src) && isimm32(src->ival)) {
if (src->ival == 0) {
int_to_val(dest, 0);
} else if (src->ival == 1) {
// do nothing
// semantic analysis, called by parser
void do_mul(Value *dest, Value *src) {
promote_arith(dest, src);
if (isint(dest)) {
if (isconst2(dest, src)) {
dest->ival *= src->ival;
} else {
gen_mul(dest, src);
}
// x64 encoding
enum Reg {
RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
R8, R9, R10, R11, R12, R13, R14, R15,
};
enum XmmReg {
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,