Skip to content

Instantly share code, notes, and snippets.

@fasiha
Last active January 10, 2020 12:58
Show Gist options
  • Save fasiha/886302b75e480d4b4e4e31f3eba055bb to your computer and use it in GitHub Desktop.
Save fasiha/886302b75e480d4b4e4e31f3eba055bb to your computer and use it in GitHub Desktop.
How Rust slays brittle indexing logic.

In writing one’s own Base64 codec for the Cryptopals Crypto Challenge in Rust, one gets to a point where every chunk of four adjacent elements in an input vector has to be transformed into a chunk of three elements in an output vector.

That is, the string SSdt containing four ASCII bytes becomes the string I'm containing three ASCII bytes, and IGtp becomes  ki, and so on, so that SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t is decoded to I'm killing your brain like a poisonous mushroom.

I had a function to do this four-to-three downconversion but looping over the two arrays, lining up the indexes, the keeping track of magic threes and magic fours in my code gave me a headache as I worked through writing the following:

pub fn decode(s: &[u8]) -> Vec<u8> {
    let mut out: Vec<u8> = vec![0; s.len() / 4 * 3];
    for i in 0..out.len() / 3 {
        let v = &s[i * 4..i * 4 + 4];
        &out[i * 3..i * 3 + 3].copy_from_slice(&quad2triplet(v[0], v[1], v[2], v[3]));
    }
    out
}

I thought Rust was just C with safety and type inference, so keeping track of indexes like this is just how it was done.

(Caveat, the code above is a tiny bit wrong—it’ll compile and give you the right answer but might pad it with one or two bytes that need trimming if the input ended in =.)

After Rusting a few more Cryptopals challenges, and binge-reading the standard library’s documentation on iterators and slices, I finally realized this decoder could be written entirely without brittle magic-numbered indexing arithmetic:

pub fn decode0(s: &[u8]) -> Vec<u8> {
    let mut out: Vec<u8> = vec![0; s.len() / 4 * 3];
    for (v, vo) in s.chunks(4).zip(out.as_mut_slice().chunks_mut(3)) {
        vo.copy_from_slice(&quad2triplet(v[0], v[1], v[2], v[3]))
    }
    out
}

Just to emphasize the crucial lines—before:

for i in 0..out.len() / 3 {                            // (🏸)
    let v = &s[i * 4..i * 4 + 4];                      // (🏑)
    &out[i * 3..i * 3 + 3].copy_from_slice(/*...*/);   // (🏏)
}

and after:

for (v, vo) in s.chunks(4).zip(out.as_mut_slice().chunks_mut(3)) {
    vo.copy_from_slice(/*...*/)
}

At first glance, the changes might look insignificant, but I reiterate.

No more indexing.

This is big. Before coding, the idea in one’s mind is, “take four-chunks, transform them into three-chunks”.

Then one labors for a few seconds–minutes ironing out three separate indexes: (🏸) how many 3-chunks in the output, each corresponding to (🏑) which chunk of the input, each going to (🏏) which chunk of the output.

And now, Rust—Rust, billed as a safe systems-level language, which I interpreted to mean “C with type inference”—lets us quite literally express our initial idea: we talk about chunks(4) of the input, and chunks_mut(3) of the output, and that’s it.

This, my friends, is much bigger than safety or systems programming. This is about ML-grade expressivity in a non-garbage-collected, compiled language.

(N.B. And it’s not just this example. I’m compiling a list of functions where I thought, “Wow, this is Clojure-clear”. Which is a compliment to Rust.)

@fasiha
Copy link
Author

fasiha commented Dec 30, 2016

The full code is here: base64decode.rs. Also, many thanks to Michael Gattozzi’s “Rust is it’s community” and “Why you should be blogging about Rust” for prodding me to write this brief note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment