Cut me some stack

(a deep dive into trampolines)

Divyansh Prakash, September 2023

Preface

In this post, I'm going to talk about one of the tradeoffs Clojure makes to be a good hosted language. That tradeoff is sticking to the host platform's function calling convention.

Unfortunately, despite being considered "high level languages", popular Java and JS VMs are still constrained by the machine's physical stack space.

This is not just a theoretical problem that fringe loons think about - it has real consequences like lazy sequences blowing the stack in Clojure, leading to bugs and lots | of | discussion.

What this means is that code like the following:

;; Clojure
(defn fact [x]
  (if (< x 2)
    1
    (*' x (fact (dec x)))))

;; (fact 5) => 120
;; (fact 20000) => stack overflow

which is roughly equivalent to the following JS:

// JS
function fact(x) {
    x = BigInt(x);

    if (x < 2)
        return 1n;
    
    return x * fact(x - 1n);
}

// fact(5) => 120n
// fact(20000) => stack overflow

will quickly run out of stack space and throw a stack overflow error, even if you have TerraBytes of RAM lying idle.

It should be noted that this "error" is infamous enough that the planet's largest programming forum is named after it.

(You will also notice that the above code uses big integers. That's because integer overflow is another restriction imposed by these "high level" languages. But let's ignore that for the purpose of this post.)

This begs the question - why are Virtual Machines for High Level Languages constrained by the stack size of my physical machine? Yes, I can tune the JVM to have a bigger stack size, but that's still weird. You can't even control this on client browsers in the case of JS.

Since Clojure decides to be a good tenant and plays along with the host's constraints, it too ends up with the same issue.

A way out

However, Clojure gives us a tool to help us in such cases: trampoline.

The docs state that:

trampoline can be used to convert algorithms requiring mutual
recursion without stack consumption. [...]

You will notice that close to 100% of trampoline examples online show mutually tail-recursive functions only.

Well, this is only partially correct - trampoline can also be used to "fix" non mutually recursive functions like fact above, as well as mutually recursive functions whether recursive at the tail or not.

This means that there are almost NO examples out there showing the full power of trampoline!

That's very sad since I believe it was invented and put to good use by Schemers decades ago, and is integrated into most functional programming platforms by default. And Clojurians, despite being Lispers and functional programmers, are mostly unaware of this beautiful construct!

In this post, I'm going to implement a trampoline and show how it can be used to fix the factorial function above.

Trampolines!

Here's a JS implementation of a trampoline:

class Thunk {
    constructor(f) {
        // f should be a function of no args
        this.f = f;
    }
}

function trampoline(f, ...args) {
    // convert f(...args) into a function g of no args
    let g = () => f(...args);

    // make a Thunk from g
    let t = new Thunk(g);

    // process thunks
    while(t instanceof Thunk)
        t = t.f();

    // return the non-thunk
    return t;
}

Using `trampoline`

Now we will see how trampoline can be used to fix our factorial function.

The first step is to convert fact to be tail recursive.

NOTE: I will leave implementing the Ackermann Function as an excercise for curious readers to illustrate the difficulty of such a conversion. Try writing an implementation that can compute A(3, 10) before reading further.

There is a very obvious way to convert fact to be tail recursive using an accumulator, but we are going to focus on a more general, mechanical way that can be done for arbitrary functions.

This process is called a CPS transform.

What this does is that instead of returning the result, fact will now pass the result to a callback.

// k is the callback (continuation)
function k_fact(k, x) {
    x = BigInt(x);

    if (x < 2)
        // call the callback with the result
        return k(1n);

    // tail recursion!
    return k_fact(
        // new k
        (xMinusOneFactorial) =>
            // call the callback with the result
            k(x * xMinusOneFactorial),
        // new x
        x - 1n
    );
}

Note: the tail recursion above can be optimized into a while loop which updates the values of k and x instead of calling k_fact again. This is called Tail Call Optimization.

This new function can be called in the following manner:

// identity function
let id = (x) => x;

// new calling convention
k_fact(id, 5); // 120n
k_fact(id, 20000); // stack overflow

As you can see, k_fact still consumes and blows the stack.

This is to be expected, since every recursive call to k_fact creates a deeper nested callback k.

We need one final transformation:

instead of calling the continuation, the call should be wrapped in a Thunk
the function should return thunks representing "the rest of the computation"

function t_fact(k, x) {
    x = BigInt(x);

    if (x < 2)
        // call to callback wrapped in thunk
        return new Thunk(() => k(1n));
    
    // recursive call wrapped in thunk
    return new Thunk(() =>
        t_fact(
            (xMinusOneFactorial) => 
                // call to callback wrapped in thunk
                new Thunk(() => k(x * xMinusOneFactorial)),
            x - 1n
        )
    );
}

Here's how we can call this function:

trampoline(t_fact, id, 5) // => 120n
trampoline(t_fact, id, 20000) // => 181920632...0000000000n

Magic! 🎉

We can even automate some of this stuff:

function stackless(t_fn) {
    return (...args) =>
        trampoline(t_fn, id, ...args);
}

which can be used in the following manner:

let factorial = stackless(t_fact);
factorial(5) // 120n
factorial(20000) // 181920632...0000000000n

Incredible!

This transformation, in my humble opinion, is second only to the Y Combinator in its beauty.

More than meets the eye

Let us now think about WHY our original (stackful) fact implementation blows the stack?

The deliciously meta answer is that the JS interpreter's eval function is a complex mutually recursive function!

It is the implementation's eval function (written probably in C++) that is stackful!

Just imagine how an expression like f(a + b, c) is evaluated:

f is evaled: it is looked up in the var table, and an object {params, body} is received
the arguments to f are evaled
- a + b is evaled
  - a is evaled: it is looked up in the var table, and 1 is received
  - b is evaled: it is looked up in the var table, and 2 is received
  - 1 + 2 is evaled and 3 is received
- c is evaled: it is looked up in the var table, and 4 is received
apply({params, body}, [3, 4]) is called
- a new lexical scope is created, extending the current one
- in this new scope, f's params are bound to 3 and 4
- eval is called for each statement in f's body

This dance of eval / apply, present in every evaluator, is a case of complex mutual recursion!

And since these functions are usually written in a lower-level machine-oriented language, they are susceptible to blowing up the machine's stack!

So the REAL fix here would be to go and rewrite my browser's JS evaluator using a trampoline.

That would remove an entire class of bugs - stack overflows - from JS and all languages that compile to JS, such as Clojure, TS, etc.

Future musings

I've been working on my own language. The core and impl namespaces contain the code for a stackless evaluator. It started off as a stackless version of Clojure, something similar to SCI, but has since become my testbed for playing with langdev.

In a future post, we will write a small stackless evaluator.

We will also see how this stackless tranformation opens the door to the mind-bending world of continuations.

Appendix: all the code

// trampoline implementation
// =========================
class Thunk {
    constructor(f) {
        this.f = f;
    }
}

function trampoline(f, ...args) {
    let g = () => f(...args),
        t = new Thunk(g);
    
    while(t instanceof Thunk)
        t = t.f();

    return t;
}

let id = (x) => x;

function stackless(t_fn) {
    return (...args) =>
        trampoline(t_fn, id, ...args);
}


// factorial implementation
// ========================
function t_fact(k, x) {
    x = BigInt(x);

    if (x < 2)
        return new Thunk(() => k(1n));

    return new Thunk(() =>
        t_fact(
            (xMinusOneFactorial) => 
                new Thunk(() => k(x * xMinusOneFactorial)),
            x - 1n
        )
    );
}

let factorial = stackless(t_fact);

divs1210/stackless.md