Nicolas IOOSS, 2024-12-03
The day before the second puzzle of ZK Hack V was released and its title revealed, a tweet proposed a challenge:
Ready for Puzzle V2 unveiling tomorrow?
Let's warm up – name of the puzzle is:
"Cyu't Nynu Bp"
Trying the Caesar cipher on the message did not give any meaningful answers. Instead, this could be a per-character substitution with a known key, like the Vigenère Cipher.
In English, only few possibilities match the sequence of 3 letters, a quote and a single letter (like Cyu't
): Isn't
, Don't
, Won't
, Bob's
...
If the decryption of 't
is 't
, this means the 4th letter of the key is A
.
What word with A
at 4th position might be used as the encryption key?
ZKHACK
, of course!
Decrypting Cyu't Nynu Bp
with this key gives Don't Look Up
.
The GIF of the tweet was actually extracted from the movie Don't Look Up.
This confirms the decryption was right!
The puzzle was released on https://zkhack.dev/zkhackV/puzzleV2.html:
Small fields have been all the rage for increasing performance on SNARKs. We should use them everywhere, also in lookup arguments! Or should we?
Puzzle Preparation
- Click the link to open and read about the Lookup Protocol of ProtoStar. Link to ProtoStar: Generic Efficient Accumulation/Folding for Special Sound Protocols
- Click the link for background material on LogUp. Link to Multivariate lookups based on logarithmic derivatives
The details of the actual puzzle were published on a GitHub repository: https://github.com/ZK-Hack/puzzle-dont-look-up
Lookup arguments based on logarithmic derivatives are super fast. Protocols that use small fields are super fast. Let's combine both!
We've implemented a range check for
$[0, 2^6-1]$ using the special-sound lookup protocol of ProtoStar (see Section 4.3, p. 34) which itself is a variant of LogUp To make this faster, we use a ≈16-bit prime field and take challenges from a larger extension to have roughly 100 bits of security.Can you submit a passing proof that
$2^{15}$ is in the expected range?
The repository contains a Rust project with few files.
src/main.rs
starts by defining some constants:
let p = 70937;
let irr = vec![3, 39025, 15086, 45405, 34941, 70897, 1];
// x^6 + 70897x^5 + 34941x^4 + 45405x^3 + 15086x^2 + 39025x + 3
They are used in statements like FieldElement::new(vec![i], p, irr.clone())
, which invokes code in src/lib.rs
.
In this file,FieldElement
represents an element of finite field
- A prime number
$p = 70937$ . - A polynomial
$Irr = x^6 + 70897x^5 + 34941x^4 + 45405x^3 + 15086x^2 + 39025x + 3$ of degree$n = 6$ with coefficients in$GF(p)$ .
We can confirm these properties hold using Sage:
$ sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 9.5, Release Date: 2022-01-30 │
│ Using Python 3.10.12. Type "help()" for help. │
└────────────────────────────────────────────────────────────────────┘
sage: p = 70937
sage: p.is_prime()
True
sage: Fp = GF(p)
sage: x = polygen(Fp)
sage: Irr = x^6 + 70897*x^5 + 34941*x^4 + 45405*x^3 + 15086*x^2 + 39025*x + 3
sage: Irr.is_irreducible()
True
sage: Fp6 = Fp.extension(Irr, "x")
sage: Fp6
Finite Field in x of size 70937^6
In the puzzle, a field element is actually a polynomial reduced modulo FieldElement
contains the coefficients of this polynomial (in field coeffs: Vec<u64>
).
To avoid lengthy notations such as
src/lib.rs
implements the usual operations between elements of
src/main.rs
then invokes some functions related to a protocol implemented in another file, src/protocol.rs
.
What it this protocol about?
The last remaining Rust file, src/protocol.rs
, defines an interactive protocol between a prover and a verifier.
To understand it, let's take a look at the test function test_protocol
and describe with mathematical notations what happens.
First, a Protocol
structure is instantiated with some parameters:
let p = 70937;
let irr = vec![3, 39025, 15086, 45405, 34941, 70897, 1];
let l = 4;
let t = 2;
let protocol = Protocol::new(l, t, p, irr.clone());
The meaning of
Next, the test function creates a witness with 4 field elements statement.t
, but naming this
It then calls several functions:
let m = protocol.count_multiplicities(&witness, &statement);
let msg1 = protocol.prove_round1(&witness, &m);
let r = protocol.verify_round1(&msg1);
let msg2 = protocol.prove_round2(&r, &msg1, &statement);
assert!(protocol.verify_round2(&msg1, &msg2, &statement));
Taking a look at these functions reveal some for
loops iterating over the witness 0..self.l
and 0..self.t
.
It makes sense to consider parameter
protocol.count_multiplicities
crafts a vector
protocol.prove_round1
puts the witness ProverMessage1
and returns it, as msg1
.
This seems to indicate what data the prover sends to the verifier.
Then, protocol.verify_round1
uses the SHA256 hash function to compute a field element r.coeffs
shows [17915, 53729, 68937, 29856, 25266, 8323]
, meaning the verifier computed:
In the next function to be called, protocol.prove_round2
, the prover computes two vectors
In the test function, this means
Finally, in protocol.verify_round2
the verifier verifies some equations:
In the test, the first one become:
It is straightforward to understand why this equation holds: the similar terms in the left-hand side can be grouped together to become what appears in the right-hand side.
The second and third equations verify
Taking a step back, this protocol is about a prover attempting to prove to a verifier the items
This kind of proof system was presented in some ZK Whiteboard Sessions:
- ZK Whiteboard Season 1 Module 6: Lookup Arguments for Performance Optimisation with Mary Maller
- ZK Whiteboard Season 2 Module 3: An Update on Lookups w/ Ariel Gabizon
Actually the equations look like ProtoStar's Special-sound protocol for Lookup relation (Section 4.3 page 34), like the puzzle subject said.
To prove that the witness
In the protocol of the puzzle, msg1
) is used to generate a challenge
Actually, ZK Whiteboard Season 2 Module 3 clearly presented such a protocol only works so long the lengths of
Let
$𝔽$ be a field of characteristic$p > \max(l, T)$ .
(With our notations, this would be written as
Here, the characteristic of
sage: Fp6.characteristic()
70937
This means for any field element
If the witness contains
The previous section actually focused on a test function in src/protocol.rs
and identified a possible attack due to the small characteristic of the finite field src/main.rs
.
There, the statement contains
The puzzle requests to "submit a passing proof that
Let's define:
So
What would the multiplicities
Then protocol.verify_round1
) and protocol.prove_round2
):
The equations verified by protocol.verify_round2
become:
The first one becomes
To implementation of the attack, the lines between /* BEGIN HACK */
and /* END HACK */
(in src/main.rs
lines 17-18) can be replaced with:
let witness = vec![invalid_witness.clone(); p as usize];
let m = vec![FieldElement::new(vec![0], p, irr.clone()); 1<<6];
And that's it!
To solve the puzzle, it was not necessary to study how protocol.verify_round1
computes the challenge
protocol.verify_round1
calls hash_to_field
, which calls hash_to_field_single
, which computes a SHA256 from
In pseudo-code, this computes:
c_0 = SHA256([0, w, m]).extract_bytes(0.. 8).decode_as_u64() % p
c_1 = SHA256([0, w, m]).extract_bytes(8..16).decode_as_u64() % p
c_2 = SHA256([1, w, m]).extract_bytes(0.. 8).decode_as_u64() % p
c_3 = SHA256([1, w, m]).extract_bytes(8..16).decode_as_u64() % p
c_4 = SHA256([2, w, m]).extract_bytes(0.. 8).decode_as_u64() % p
c_5 = SHA256([2, w, m]).extract_bytes(8..16).decode_as_u64() % p
r = c_0 + c_1 x + c_2 x^2 + c_3 x^3 + c_4 x^4 + x_5 x^5
Can we find a way to make this function generate a chosen
Moreover, every coefficient
So we can consider it is impossible to get
How does hash_to_field_single
combine together
For example, when hash_to_field_single(w, m, index)
with index = 0
hashes (in hexadecimal):
- one byte
index
:00
, - the coefficients of
$w_0 = 1$ as 64-bit Little Endian integers:0100000000000000
- a separator:
FF
- the coefficients of
$w_1 = 2 + 3x$ :0200000000000000 0300000000000000
- a separator:
FF
- the coefficients of
$w_2 = 0$ : nothing - a separator:
FF
- the coefficients of
$w_3 = 4$ :0400000000000000
- a separator:
FF
- the coefficients of
$m_0 = 5$ :0500000000000000
- a separator:
FF
- the coefficients of
$m_1 = 6$ :0600000000000000
- a separator:
FF
So it computes SHA256(00 0100000000000000 FF 0200000000000000 0300000000000000 FF FF 0400000000000000 FF 0500000000000000 FF 0600000000000000 FF)
.
When looking at this, a first weakness appears: we cannot distinguish
Using
Moreover this malleability makes a second weakness become apparent: the lengths of
-
prove_round1
uses all items from$w$ and$m$ to generatemsg1
, whatever$l$ and$t$ are. -
verify_round1
uses all items from$w$ and$m$ to generate$r$ . -
prove_round2
uses$(w_0, w_1..., w_{l-1})$ to compute$h$ and$(m_0, m_1..., m_{t-1})$ to compute$g$ . -
verify_round2
uses$(w_0, w_1..., w_{l-1})$ and$(m_0, m_1..., m_{t-1})$ as well.
If
Is this second weakness an exploitable vulnerability?
- If it was the other way round, this would have been a critical vulnerability as the prover would have been able to add items to the witness after having committed to it and knowing what the challenge
$r$ will be. - In practice,
src/main.rs
defineslet l = witness.len();
so it is impossible for the witness to contain more than$l$ items. - Given a witness
$w$ and multiplicities$m$ , the weakness makes it possible to add items to$m$ to modify$r$ , without altering the fractional polynomials in the second round. Said differently, this enables to change the field element where the fractional polynomials are evaluated. If the hash function was weak, this would have been fatal, as this field element could have been arbitrarily chosen to make the verifier believe some fractional polynomials are equal even though there are not. But the 97 bits of security of the hash function make this attack impossible.
Studying the consequences of the first weakness do not reveal any possible attack either.
Moving field elements from
Anyway, it would have been better (to be sure there is no vulnerability in the commitment round) if these weaknesses did not exist.
Several recommendations exist to fix the first weakness in function hash_to_field_single
:
- Adding another separator between
$w$ and$m$ when hashing data. - Hashing the lengths of
$w$ and$m$ somewhere (for example hashing[index, len(w), w, len(m), m]
or[index, len(w), len(m), w, m]
) - Hashing only
$l$ items from$w$ and$t$ items from$m$ .
Actually, the third recommendation enables to also fix the second weakness, by making all functions only use
This puzzle presented how it was dangerous to mix LogUp with a small-characteristic field. If the length of the witness can be larger than the characteristic, the soundness of the protocol is broken and a prover can cheat. In the puzzle, this enables to bypass a range check and use larger values than intended.