Write-up for zkHack puzzle #4: Hidden in Plain Sight

Author: Nicolas IOOSS
Date: 2021-11-19
Puzzle: https://www.zkhack.dev/puzzle4.html

1. Subject

The puzzle is described in GitHub repository https://github.com/kobigurk/zkhack-hidden-in-plain-sight:

One-thousand accounts participate in a shielded pool which hides the recipients and other data in each transaction between parties in the pool. The recipient is a 256-bit account address which is hidden by blinding a KZG-like polynomial commitment to the address. The sender of the transaction chooses two secret blinding factors known only to them by which the polynomial commitment is blinded. Included with the commitment are two openings used to verify the commitment. You intercept a transaction by observing the shielded pool. Armed with the blinded commitment and two openings from the intercepted transaction as well as the public data (the trusted setup for the KZG commitment scheme, a list of all one-thousand account addresses participating in the pool, and the two random challenges used to compute the openings) can you deanonymize the recipient address?

At first glance, this seems to be quite complex.

Looking at the Rust crates which are used, the puzzle involves operations using BLS12-381 curve:

use ark_bls12_381::{Fr, G1Affine};

Fr is a structure which represents the finite field of integers modulo r = 52435875175126190479447740508185965837690552500527637822603658699938581184513. This r is the order of the first prime subgroup which is usually used when working with BLS12-381 curves.
A G1Affine object represents a point on the the first BLS12-381 curve.

The puzzle includes a binary file named challenge_data, which is parsed in function read_cha_from_file. The following variables are extracted from this file:

setup: a vector of 1024 points (G1Affine).
accts: a vector of 1000 accounts, each being a vector of 32 bytes.
cha_1, cha_2, opn_1, opn_2: several Fr integers.
commt: a G1Affine point.

The subject describes how these variables were generated and the Rust project contains a file which implements this generation, src/generate.rs. This generation involves an evaluation domain, a vanishing polynomial, roots of the unity and other uncommon concepts. Why do these concepts appear here?

To understand this, let's briefly talk about PLONK.

2. PLONK mathematics

PLONK means "Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge" and has been described in several places:

While PLONK is a powerful tool which works on circuits to establish proof of knowledge of a secret, the puzzle only uses a small part of PLONK: the one which is about committing a list of values. The idea is to design an algorithm which transforms the list into a polynom and computes a commitment, without revealing the list itself.

To do this, the list of values (a_0, a_1,... a_{n-1}) is first converted into a polynom P(x) which evaluates to a_0 on a known evaluation integer, to a_1 on another integer, etc. To simplify the equations, the known integers are the root of the unity of order n in Fr.

Said differently, the algorithm uses an integer ω (in Fr) which was chosen such that:

ω^n = 1 (modulo r)
ω^k is never 1 for k in 1, 2, 3..., n-1.

Then the n roots of the unity of order n are 1, ω, ω^2, ω^3..., ω^{n-1} and it can be proved that these integers are all different.

From a list (a_0, a_1, ... a_{n-1}), a polynom P(x) is defined such that P(1) = a_0, P(ω) = a_1, P(ω^2) = a_2... P(ω^{n-1}) = a_{n-1}. Crafting such a polynom can be done using Lagrange polynomials. Nevertheless, choosing roots of the unity enables using another computation method, thanks to P(ω^k) being a coefficient of the Fast Fourier Transform (FFT) of P(x). Therefore P(x) can be seen as the inverse of the FFT (named "IFFT") of a polynomial built from the list: a_0 + a_1 x + a_2 x^2 + a_3 x^3... + a_{n-1} x^{n-1}.

Committing P(x) can be done using KZG commitment. This commitment procceds in the following steps:

Generate a secret integer s randomly.
Choose a generator point g on an elliptic curve. In the puzzle, a standardized generator of a prime subgroup of BLS12-381 curve is used.
Compute a "trusted setup" (g, s.g, (s^2).g, (s^3).g...)
Compute a commitment P(s).g

In fact the prover does not need to know s to compute the commitment, as P(s).G can be computed using only the coefficients of the polynomial P(x) and the trusted setup.

Nevertheless if P(x) was committed directly, its commitment would easily allow to identify whether a secret input (the list a_0, a_1...) matches the committed polynomial. To prevent this, a blinding scheme is used:

Generate two secret integers b_0 and b_1 randomly.
Compute a blinded polynomial Q(x) = P(x) + (b_0 + b_1 x) (x^n - 1).
Commit Q(x) by computing Q(s).g using the trusted setup.

Here, x^n - 1 is called the "vanishing polynomial", as it evaluates to zero on every evaluation integer (1, ω, ω^2...). It is used in order to have an interesting property: for every root of unity ω^k, Q(ω^k) = P(ω^k).

Finally PLONK and other commitment schemes also compute two things:

An opening: from a challenge c, Q(c) is computed. The result and c are part of the proof.
Another commitment: To prove that Q(c) was indeed the evaluation of the committed Q, the polynom R(x) = (Q(x) - Q(c)) / (x - c) is computed and its commitment R(s).g is part of the proof.

But here, the puzzle chooses to do something different!

3. The vulnerability

Now that several notions were presented, here is a description of the algorithm used to generate the puzzle.

First, 1000 account addresses are randomly generated in accts. One of these addresses is randomly chosen, target_acct. This is the address which will be committed.

Then a GeneralEvaluationDomain<Fr> object is created. This object is used to evaluate polynomials on the 1024 roots of unity of order 1024. The chosen ω and its power can be displayed in Rust with:

let domain: GeneralEvaluationDomain<Fr> = GeneralEvaluationDomain::new(accts.len() + 2).unwrap();
println!("domain = {:?}", domain);
for k in 0..1025 {
    println!("ω^{} = {}", k, domain.element(k));
}

This prints:

domain = Radix2(Radix-2 multiplicative subgroup of size 1024)
ω^0 = Fp256 "(0000000000000000000000000000000000000000000000000000000000000001)"
ω^1 = Fp256 "(325DB5C3DEBF77A18F4DE02C0F776AF3EA437F9626FC085E3C28D666A5C2D854)"
ω^2 = Fp256 "(095166525526A65439FEEC240D80689FD697168A3A6000FE4541B8FF2EE0434E)"
ω^3 = Fp256 "(077FB7A1C70A28635FEA4420D98DE0DDFFD8C7E12991D3593A7478C980A80DCD)"
...
ω^1022 = Fp256 "(6A61DBD781FC6D1C881DAB49E15D37277DD34DC48B542A7916654B4E3193F385)"
ω^1023 = Fp256 "(5164CA4CF7456C9B942C20E45C3AD14D9CE862487904D0AB1B561347C6A47727)"
ω^1024 = Fp256 "(0000000000000000000000000000000000000000000000000000000000000001)"

We can verify in Python that this is indeed a root of the unity of order 1024:

>>> ω = 0x325DB5C3DEBF77A18F4DE02C0F776AF3EA437F9626FC085E3C28D666A5C2D854
>>> r = 52435875175126190479447740508185965837690552500527637822603658699938581184513
>>> hex(pow(ω, 2, r))
'0x95166525526a65439feec240d80689fd697168a3a6000fe4541b8ff2ee0434e'

>>> hex(pow(ω, 1023, r))
'0x5164ca4cf7456c9b942c20e45c3ad14d9ce862487904d0ab1b561347c6a47727'

>>> pow(ω, 1024, r)
1

Then:

A trusted setup with 1024 points (g, s.g, (s^2).g, (s^3).g...) is computed. This is what the variable setup contains.
The polynomial P(x) is computed by computing an IFFT of target_acct in the chosen evaluation domain.
Two integers b_0 and b_1 are randomly generated and are used to compute Q(x) = P(x) + (b_0 + b_1 x) (x^n - 1).
The commitment Q(s).g is computed using the trusted setup. This the point commt.
Two "challenge" integers cha_1 and cha_2 are randomly generated. They are used to compute two openings opn_1 = Q(cha_1) and opn_2 = Q(cha_2).

This explains where each variable comes from.

However using two openings instead of one greatly weakens the security of the algorithm. Indeed, this enables to retrieve b_0 and b_1 when P(x) is known as the following equations are linear:

opn_1 = Q(cha_1) = P(cha_1) + (b_0 + b_1 cha_1) (cha_1^n - 1)
opn_2 = Q(cha_2) = P(cha_2) + (b_0 + b_1 cha_2) (cha_2^n - 1)

These equations can be simplified:

b_0 + b_1 cha_1 = (opn_1 - P(cha_1)) / (cha_1^n - 1)
b_0 + b_1 cha_2 = (opn_2 - P(cha_2)) / (cha_2^n - 1)

This leads to the following formulas:

b_1 = ((opn_1 - P(cha_1)) / (cha_1^n - 1) - (opn_2 - P(cha_2)) / (cha_2^n - 1)) / (cha_1 - cha_2)
b_0 = (opn_1 - P(cha_1)) / (cha_1^n - 1) - b_1 cha_1

In short, the bliding scheme is broken because there are two openings of the blinded polynomial.

To exploit the vulnerability, an attacker needs to iterate other all the addresses in accts and for each one, compute P(x), retrieve b_0 and b_1, compute Q(x), compute the commitment Q(s).g using the trusted setup and compare it with commt.

In Rust, here is an implementation of this attack:

let domain: GeneralEvaluationDomain<Fr> = GeneralEvaluationDomain::new(accts.len() + 2).unwrap();
let van_poly = domain.vanishing_polynomial();
let van_poly_cha_1 = van_poly.evaluate(&cha_1);
let van_poly_cha_2 = van_poly.evaluate(&cha_2);

for (target_idx, target_acct) in accts.iter().enumerate() {
    println!("Testing target {}", target_idx);

    // Compute P(x)
    let target_acct_poly = DensePolynomial::from_coefficients_vec(domain.ifft(&target_acct));

    // Retrieve b_0 and b_1
    let value_1 = (opn_1 - target_acct_poly.evaluate(&cha_1)) / van_poly_cha_1;
    let value_2 = (opn_2 - target_acct_poly.evaluate(&cha_2)) / van_poly_cha_2;
    let b_1 = (value_2 - value_1) / (cha_2 - cha_1);
    let b_0 = value_1 - b_1 * cha_1;

    // Compute Q(x)
    let blinding_poly = DensePolynomial::from_coefficients_vec(vec![b_0, b_1]);
    let blinded_acct_poly = target_acct_poly + blinding_poly.mul_by_vanishing_poly(domain);

    // Check that there was no error in the attack implementation
    assert_eq!(blinded_acct_poly.evaluate(&cha_1), opn_1);
    assert_eq!(blinded_acct_poly.evaluate(&cha_2), opn_2);

    // Compute Q(s).g
    let commitment: G1Affine = kzg_commit(&blinded_acct_poly, &setup);
    if commitment == commt {
        println!("Found matching commitment for account {}", target_idx);
    }
}

After some time, this displays:

Found matching commitment for account 535

So the account which was used in the data was accts[535].

Appendice: bugs in the puzzle

The Rust implementation of the generation algorithm used by the puzzle has two bugs:

The polynomials are supposed to have 32 coefficients. But the evaluation domain was built with a size of 1024 instead of 32, so the polynomials have 1024 coefficients. This does not change the solution of the puzzle but is surprising.
The commitment of Q(x) is not Q(s).g as Q(x) is truncated. More precisely, in the puzzle, the degree of P(x) is 1023 so Q(x) = P(x) + (b_0 + b_1 x) (x^1024 - 1) = (P(x) - b_0 - b_1 x) + b_0 x^1024 + b_1 x^1025 has degree 1025. However the trusted setup only has 1024 points. When the commitment of Q(x) is computed, p.coeffs().iter().zip(setup) is used to iterate on the coefficients, and this stops after the coefficient of x^1023. Therefore in practice, the coefficients of x^1024 and x^1025 in Q(x) are truncated and commt is the commitment of P(x) - b_0 - b_1 x

These bugs do not impact the puzzle and its vulnerability.

niooss-ledger/zkhack2021_puzzle4_writeup.md

Write-up for zkHack puzzle #4: Hidden in Plain Sight

1. Subject

2. PLONK mathematics

3. The vulnerability

Appendice: bugs in the puzzle