(WIP) How to construct SNARKs for recursive functions (part 2)

(If you do not know about Nova, please read the first part)

Introduction

In the first part, we've been able to produce a SNARK for a very particular case of a recursive function. It was first of all a recursive function from and to numbers and would only call itself recursively once in a branch: i.e., it had a linear call trace. We will show in this article how to construct SNARKs for general recursive functions

might take/return data types instead of plain numbers
might call themselves multiple times in a branch
might call foreign functions, which could be mutually dependent on them

Recursive functions on data types

How to represent data types

The first thing we need to be able to do is find a way to represent data types as numbers (or more specifically, elements of a finite field). We will follow an approach very similar to Lurk[3], which requires a collision resistant hash function:

all values that are limited in size, which is smaller or equal to the field size, such as chars, booleans and simple enums, will be represented natively as numbers
a tuple (or product type) of other data types will be represented by a hash of the (recursive) representation of the tuple elements
a Rust-like enum (or sum type) constructor will be represented by two components: a unique tag representing each respective constructor (could be a simple ordering of constructors) and the representation of the tuple of arguments of the constructor (if there are any).

Here are some examples, written in Haskell notation:

exTuple = (true, 10, 'a')

data Tree = Leaf Int | Branch Tree Tree
exTree = Branch (Branch 11 7) (Leaf 8)

The representation of exTuple will be hash3(1, 10, 97), since 1, 10, 97 are the representations of true, 10, 'a' respectively. The constructors of Tree, i.e. Leaf and Branch, will be given unique tags, such as 0 and 1. Then the representation of exTree will be (1, hash4(1, hash4(0, 11, 0, 7), 0, 8)).

How to deconstruct data types

Although we've given a way to construct data types, by effectively hash consing, we currently have no way of deconstructing data type. How can we produce a circuit that, for example, extracts the first element of tuples, if the circuit only ever sees its hash and not the tuple itself? How can we reconstruct the tuple from the hash so we can extract its elements? The answer is: we don't reconstruct the tuple, we simply take the tuple itself (the preimage of the hash), as a nondeterministic advice (as a hidden input). Then all the circuit has to do is verify that the public input h is equal to hash2(x, y), x, y being the advices, and then output x.

Recursion on data types

Let us now construct a SNARK for a function that does structural recursion on a data type. Say, let us create the protocol for the sum function:

sum [] = 0
sum (x : xs) = x + sum xs

Following the approach of our first recursive function, the $\textrm{sum}$ function should have the following structure $\textrm{sum}(vk, \textrm{xs} | ω, u, U, π) → (s, h)$ where $vk, \textrm{xs}$ are the public inputs, $u, ω, U, π$ the nondeterministic advices and $s$ and $h$ the (public) outputs. $vk$ is the verifier key for a folding scheme, $ω$ the witness, $u$ an R1CS instance and $U$ a relaxed R1CS instance, which should have a public IO size of 5 (same as $\textrm{sum}$), and $π$ a supposed folding proof for $u$ and $U$.

The tag of the empty list [] will be $\textrm{NIL} = 0$ and the tag of consing x : xs will be $\textrm{CONS} = 1$ (see how to represent data types). The advice $ω$ will contain the preimages of hashes (see how to deconstruct datatypes). Here's how $\textrm{sum}$ is defined:

if $\textrm{xs.tag} = \textrm{NIL}$
- return $(0, hash(u_⊥)),$
else:
- check $\textrm{xs.tag} = \textrm{CONS}$
- extract $(\textrm{head}, \textrm{tail}) ← ω$
- check $\textrm{xs.bod} = \textrm{hash3}(\textrm{head}, \textrm{tail.tag}, \textrm{tail.bod})$
- check $u\textrm{.input} = (vk, \textrm{tail})$
- extract $(s, h) = u\textrm{.output}$
- check $h = \textrm{hash}(U)$
- compute $U' ← \textrm{fold.V}(vk, u, U, π)$
- compute $s' ← \textrm{head} + s$
- return $(s', \textrm{hash}(U'))$

The folding protocol will mostly follow $\textrm{sum}$:

$\textrm{IVC.F}(pk_{\textrm{fold}}, vk_{\textrm{fold}}, \textrm{xs}, ω) → (u_n, w_n, U_n, W_n)$:

if $\textrm{xs.tag} = \textrm{NIL}$:
- compute the trivial case $(u_0, w_0) ← \textrm{trace}(G', (vk_{\textrm{fold}}, \textrm{xs} | ø, ø, ø, ø)),$ for $ø$ a dummy value, like $0$
- return $(u_0, w_0, u_⊥, w_⊥)$
else:
- extract $\textrm{tail} ← ω_n\textrm{.tail}$ — there's no need to check that this is well-formed, since it will be done by the circuit
- recursively compute $(u_{n-1}, w_{n-1}, U_{n-1}, W_{n-1}) ← \textrm{IVC.F}(pk_{\textrm{fold}}, vk_{\textrm{fold}}, \textrm{tail})$
- compute $(U_n, W_n, π_n) ← \textrm{fold.P}(pk_{\textrm{fold}}, u_{n-1}, U_{n-1}, w_{n-1}, W_{n-1})$
- compute $(u_n, w_n) ← \textrm{trace}(G', (\textrm{vk}, \textrm{xs} | ω_{n-1}, u_{n-1}, U_{n-1}, π_n))$
- return $(u_n, w_n, U_n, W_n)$

And the prover and verifier protocol will have basically no change apart from adjusting the input:

$\textrm{IVC.P}(\textrm{pk}, u_n, w_n, U_n, W_n) → (π, Π)$:

compute $(U_{n+1}, W_{n+1}, π) ← \textrm{fold.P}(pk_{\textrm{fold}}, u_n, U_n, w_n, W_n)$
compute $Π ← \textrm{SNARK.P}(pk_{\textrm{SNARK}}, U_{n+1}, W_{n+1})$
return $(π, Π)$

$\textrm{IVC.V}(vk, \textrm{xs}, u_n, U_n, π_n, Π)$:

check $u_n\textrm{.input} = (vk_{\textrm{fold}}, \textrm{xs})$
check $u_n\textrm{.output.h} = \textrm{hash}(U_n)$
compute $U_{n+1} ← \textrm{fold.V}(vk_{\textrm{fold}}, u_n, U_n, π_n)$
check $\textrm{SNARK.V}(vk_{\textrm{SNARK}}, U_{n+1}, Π)$

Multiple recursive calls (non-linear IVC)

Now let's make it more challenging: summing binary trees. Here's the Haskell code for it:

data Tree = Leaf Int | Branch Tree Tree

sum (Leaf a) = a
sum (Branch ts ts') = sum ts + sum ts'

The Leaf constructor will get the tag $\textrm{LEAF} = 0$ and the Branch constructor the tag $\textrm{BRANCH} = 1$. Now, because this function calls itself twice, but no more than this, the circuit that represents it will need two instances $u_1, u_2$ for each recursion call, but also two accumulators $U_1, U_2$. Each accumulator $U_i$ is the folding of all the instances in each respective subtree of calls apart from $u_i$. But since it must output the folding of all instances corresponding to each call of $\textrm{sum}$, then we must fold the two instances and the two accumulators: we will need 3 folding proofs, $π_1, π_2, π_3$. Here's the form of the function: $\textrm{sum}(vk, \textrm{ts}, | ω, u, U, π) → (s, h)$ where $u$ and $U$ have two components and $π$ has three components.

Now, there's a caveat here. We were assuming we only needed to fold R1CS instances with relaxed R1CS. Now we will need the general fold of two relaxed R1CS instances. Because of this, we introduce the injection of R1CS to relaxed R1CS as $ι$. We will also assume the existence of a folding protocol that folds more than two instances (which can be done as a sequence of folds). Here's how the function thus behaves:

if $\textrm{ts.tag} = \textrm{LEAF}$
- return $(\textrm{ts.bod}, hash(u_⊥))$
else:
- check $\textrm{ts.tag} = \textrm{BRANCH}$
- extract $(\textrm{left}, \textrm{right}) ← ω$
- check $\textrm{ts.bod} = \textrm{hash4}(\textrm{left.tag}, \textrm{left.bod}, \textrm{right.tag}, \textrm{right.bod})$
- check $u^L\textrm{.input} = (vk, \textrm{left})$
- check $u^R\textrm{.input} = (vk, \textrm{right})$
- extract $(s^L, h^L) = u^L\textrm{.output}$
- extract $(s^R, h^R) = u^R\textrm{.output}$
- check $h^L = \textrm{hash}(U^L)$
- check $h^R = \textrm{hash}(U^R)$
- compute $U' ← \textrm{fold}_4\textrm{.V}(vk, ι(u^L), U^L, ι(u^R), U^R, π)$
- compute $s' ← s^L + s^R$
- return $(s', \textrm{hash}(U'))$

The IVC's prover and verifier protocols will have basically no change:

$\textrm{IVC.P}(\textrm{pk}, u, w, U, W) → (π, Π)$:

compute $(U', W', π) ← \textrm{fold.P}(pk_{\textrm{fold}}, u, U, w, W)$
compute $Π ← \textrm{SNARK.P}(pk_{\textrm{SNARK}}, U', W')$
return $(π, Π)$

$\textrm{IVC.V}(vk, \textrm{xs}, u, U, π, Π)$:

check $u\textrm{.input} = (vk_{\textrm{fold}}, \textrm{xs})$
check $u\textrm{.output.h} = \textrm{hash}(U)$
compute $U' ← \textrm{fold.V}(vk_{\textrm{fold}}, u, U, π)$
check $\textrm{SNARK.V}(vk_{\textrm{SNARK}}, U', Π)$

The IVC's folding protocol will follow the shape of the $\textrm{sum}$ function, though now the advices will take a form of a tree as well:

$\textrm{IVC.F}(pk_{\textrm{fold}}, vk_{\textrm{fold}}, \textrm{ts}, ω) → (u, w, U, W)$:

if $\textrm{ts.tag} = \textrm{LEAF}$:
- compute the trivial case $(u_0, w_0) ← \textrm{trace}(G', (vk_{\textrm{fold}}, \textrm{ts} | ø, ø, ø, ø))$
- return $(u_0, w_0, u_⊥, w_⊥)$
else:
- extract $(\textrm{left}, ω^L, \textrm{right}, ω^R) ← ω$
- recursively compute $(u^L, w^L, U^L, W^L) ← \textrm{IVC.F}(pk_{\textrm{fold}}, vk_{\textrm{fold}}, \textrm{left}, ω^L)$
- recursively compute $(u^R, w^R, U^R, W^R) ← \textrm{IVC.F}(pk_{\textrm{fold}}, vk_{\textrm{fold}}, \textrm{right}, ω^R)$
- compute $(U, π) ← \textrm{fold}_4\textrm{.V}(vk, (ι(u^L), U^L, ι(u^R), U^R), (w^L, W^L, w^R, W^R))$
- compute $(u, w) ← \textrm{trace}(G', (\textrm{vk}, \textrm{ts} | (\textrm{left}, \textrm{right}), u, U, π))$
- return $(u, w, U, W)$

Mutual recursion with multiple recursive calls! (non-uniform, non-linear IVC)

References

[1]: "Nova: Recursive Zero-Knowledge Arguments from Folding Schemes", Abhiram Kothapalli, Srinath Setty, and Ioanna Tzialla, 2021.
[2]: "SuperNova: Proving universal machine executions without universal circuits", Abhiram Kothapalli, and Srinath Setty, 2022.
[3]: "LURK: Lambda, the Ultimate Recursive Knowledge", Nada Amin, John Burnham, François Garillot, Rosario Gennaro, Chhi’mèd Künzang, Daniel Rogozin, and Cameron Wong. 2023.

gabriel-barrett/recursive-functions-as-snarks-2.md