tommymcm / lto-custom-pass.md

Last active May 27, 2025 15:16

Custom LLVM passes at LTO

Running Custom LLVM Passes at LTO

I had a long time trying to find the proper way to run LLVM pass plugins at LTO. There were a lot of times where I found a solution and it turned out to be flimsy and break under some circumstances, or not fully support all the features I needed. This post gives your my solution and concludes with some thoughts about why this solution works. If anyone knows that I am correct/wrong please confirm/correct me. Otherwise, take the explanation with a grain of salt, I have not extensively tested it.

My Solution

In the examples, I will use a hypothetical LLVM pass plugin where:

MyPlugin.so is the shared object containing the LLVM pass plugin.

pizlonator / pizlossafull.md

Last active August 4, 2025 17:09

How I implement SSA form

This document explains how I would implement an SSA-based compiler if I was writing one today.

This document is intentionally opinionated. It just tells you how I would do it. This document is intended for anyone who has read about SSA and understands the concept, but is confused about how exactly to put it into practice. If you're that person, then I'm here to show you a way to do it that works well for me. If you're looking for a review of other ways to do it, I recommend this post.

My approach works well when implementing the compiler in any language that easily permits cyclic mutable data structures. I know from experience that it'll work great in C++, C#, or Java. The memory management of this approach is simple (and I'll explain it), so you won't have to stress about use after frees.

I like my approach because it leads to an ergonomic API by minimizing the amount of special cases you have to worry about. Most of the compiler is analyses and transformations ov

pizlonator / pizlossa.md

Last active July 28, 2025 23:07

Pizlo SSA Form (short version)

Here's a much more complete description of how I do SSA, beyond just how I do Phis.

This describes how I do SSA form, which avoids the need to have any coupling between CFG data structures and SSA data structures.

Let's first define a syntax for SSA and some terminology. Here's an example SSA node:

A = Add(B, C)

In reality, this will be a single object in your in-memory representation, and the names are really addresses of those objects. So, this node has an "implicit variable" called A; it's the variable that is implicitly assigned to when you execute the node. If you then do:

AndrasKovacs / TwoStageRegion.md

Last active July 26, 2025 08:26

Lightweight region memory management in a two-stage language

Intro
Basics
- Stage inference
Regions
- Bit-stealing
- Using regions
Eager regions

bjacob / README.md

Last active November 22, 2023 00:24

Relative performance of matmul element types on x86 and Arm

Context

Recent efforts to run LLMs send us searching for some element types to quantize weights and activations into, that will somehow be wide enough to provide enough accuracy, and narrow enough to provide enough performance and/or memory compression.

This document is about the "performance" dimension, specifically on x86 and Arm architectures.

AndrasKovacs / ZeroCostGC.md

Last active July 8, 2025 02:56

Garbage collection with zero-cost at non-GC time

Garbage collection with zero cost at non-GC time

Every once in a while I investigate low-level backend options for PL-s, although so far I haven't actually written any such backend for my projects. Recently I've been looking at precise garbage collection in popular backends, and I've been (like on previous occasions) annoyed by limitations and compromises.

I was compelled to think about a system which accommodates precise relocating GC as much as possible. In one extreme configuration, described in this note, there

Marc-B-Reynolds / output.md

Last active November 25, 2024 22:53

brute force testing of 1/sqrt functions

click for range breakdown

checking on [3f800000,40000000] [1.000000e+00,2.000000e+00]

func	e	max ULP	CR	FR	2 ULP	> 2 ULP	CR%	FR%	2 ULP%	> 2 ULP%
vrsqrte_f32	--	4947968	103	225	216	8388065	0.001228	0.002682	0.002575	99.993515
FRSR_Mon0	--	564177	3	8	6	8388592	0.000036	0.000095	0.000072	99.999797
FRSR_Deg0	--	403258	0	0	0	8388609	0.000000	0.000000	0.000000	100.000000
FRSR_Mon1	--	14751	230	464	466	8387449	0.002742	0.005531	0.005555	99.986172

robrich / README.md

Last active May 7, 2024 13:24

the definitive deep dive into the .git folder

Thanks for joining us for "the definitive deep dive into the .git folder". It's an incredible live-demo where we open every file in the .git folder and show what it does.

Links

Here's the links we saw:

pdarragh / papers.md

Last active February 23, 2025 02:04

Approachable PL Papers for Undergrads

On September 28, 2021, I asked on Twitter:

PL Twitter:

you get to recommend one published PL paper for an undergrad to read with oversight by someone experienced. the paper should be interesting, approachable, and (mostly) self-contained.

what paper do you recommend?

pervognsen / shift_dfa.md

Last active August 3, 2025 16:23

Shift-based DFAs

A traditional table-based DFA implementation looks like this:

uint8_t table[NUM_STATES][256]

uint8_t run(const uint8_t *start, const uint8_t *end, uint8_t state) {
    for (const uint8_t *s = start; s != end; s++)
        state = table[state][*s];
    return state;
}

Matt MattPD