Skip to content

Instantly share code, notes, and snippets.

@Nekrolm
Created January 21, 2025 18:43
Show Gist options
  • Save Nekrolm/4d4266c82ee50827e4f6726df593e854 to your computer and use it in GitHub Desktop.
Save Nekrolm/4d4266c82ee50827e4f6726df593e854 to your computer and use it in GitHub Desktop.

Why unsafe is unsafe

Once I had seen this function. It was a nice function, but there is a problem

pub fn encode_logs(logs: impl IntoIterator<Item = impl AsRef<str>>) -> String {
    logs.into_iter().map(|line|
        line.as_ref()
            .replace("\\", "\\\\")
            .replace("\n", "\\n")
            .replace("\t", "\\t")
            .replace("\r", "\\r")
    ).collect()
}

For each processed string it allocates and deallocates memory 4 times. This is not an issue for toy projects, but when your host must hold 20K+ RPS with 10KB logs in each request... It’s become a problem and overhead is significant.

So we can do better. E.g. We can process characters in each original string and apply corresponding transformation

pub fn encode_logs(logs: impl IntoIterator<Item = impl AsRef<str>>) -> String {
    logs.into_iter().fold(String::new(), |buf, line| {
        line.as_ref().chars().fold(buf, |mut buf, c| match c {
            '\\' => buf + "\\\\",
            '\n' => buf + "\\n",
            '\t' => buf + "\\t",
            '\r' => buf + "\\r",
            x => { buf.push(x); buf },
        })
    })
}

We can do even better, via raw bytes manipulations, instead of codepoints (Rust’s char is unicode codepoint, 4 bytes). But this is a different story.

We applied this function change. It was reasonable. It passes meaningful tests. It’s safe and correct. That’s fine ...

When change reached production... This function crashed with panic! What’s went wrong?!

The lurking shadow

There is a lot of zero-copy serialization formats in the wild. And one of them is google flatbuffers. They work fine. They have multilanguage support: C++, Java, Rust...

And there our story begins.

Rust version of flatbuffers library passed though many iterations. And now it has weird sem version number on crates.io: 24.12.23 Major version 24! Wow...

Well, they had just changed versioning approach and went from version 2 to 22. There aren’t 24 different major versions, of course.

Like other zero-copy serialization libraries, Flatbuffers internals are completely unsafe. They operate on raw bytes and try to materialize some objects for you. And this is fine. If you are sure that the bytes are correct, valid and actually contain your objects.

And here the historical legacy strikes back

Flatbuffers library for Rust is pretty new. Libraries for C++ and Java much older and mature. Version 2.xx for Rust provided “experimental” verification API to check a buffer before access. And this experimental API didn’t work. Especially for generated user structs.

Yeah... But it was still usable, if you trust sender. Right? Or if you have some proxy written in C++, that performs such validation for you... Yeah, sounds weird. Trust C++ code...

Logs were stored in flatbuffer. Just as array of strings. logs: [string]

Flatbuffers for Rust can access them. And this is how it’s implemented

impl<'a> Follow<'a> for &'a str {
    type Inner = &'a str;
    unsafe fn follow(buf: &'a [u8], loc: usize) -> Self::Inner {
        let len = read_scalar_at::<UOffsetT>(buf, loc) as usize;
        let slice = &buf[loc + SIZE_UOFFSET..loc + SIZE_UOFFSET + len];
        from_utf8_unchecked(slice)
    }
}

This is unsafe function. And it calls another unsafe function without checks. As intended.

But such unsafe internal functions are wrapped into safe accessors. We had array of strings, and it could be accessed via safe iter() method https://docs.rs/flatbuffers/latest/flatbuffers/struct.Vector.html#method.iter

So again, what’s went wrong if we could trust the sender?

Trust no one

Well. The sender is supposed to be trusted. But the sender was written in C++. And there another historical issue that bites us:

C++ flatbuffers library has a verifier API. You can do something like bool ok = VerifyMonsterBuffer(Verifier(buf, len)); And be sure that you can access buf safely. In terms of C++ safety.

Let’s quickly return to the beginning: we had functions that transforms strings. And changes there caused crash. That means something went wrong with string data in flatbuffer. But C++ sender performed validation and said that everything is ok.

Well. For C++, std::string may contain arbitrary garbage. It doesn’t care. And C++ flatbuffers library also doesn’t care. But Rust does.

Rust’s &str and String must contain valid utf-8 sequence. And verifier didn’t check it. google/flatbuffers#4916

If you somehow construct &str or String with invalid utf-8 sequence inside — it’s undefined behaviour.

Flatbuffers crate had “safe” API. But with undefined behaviour. And our minor changes in completely unrelated safe function broke everything!

Consequences

Rust flatbuffers crate v 24.x.x now has working verification API. That’s great...

I had to revert encode_logs function. Because I cannot just enable “fixed” verification! It rejects playload with broken strings and we seems had a customer who generates broken strings and relies on them...

Thats a cursed world we had created and have to maintain. Checking every time that changes in safe Rust code didn’t break this insanity...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment