Skip to content

Instantly share code, notes, and snippets.

@quad
Last active November 5, 2024 00:16
Show Gist options
  • Save quad/a8a7cc87d1401004c6a8973947f20365 to your computer and use it in GitHub Desktop.
Save quad/a8a7cc87d1401004c6a8973947f20365 to your computer and use it in GitHub Desktop.
Modular Errors with Rust's thiserror

I've been writing Rust full-time with a small team for over a year now. Throughout, I've lamented the lack of clear best practices around defining error types. One day, I'd love to write up my journey and enumerate the various strategies I've both seen and tried. Today is not that day.

Today, I want to reply to a blog post that almost perfectly summarised my current practice.

Go read it; I'll wait!


Welcome back.

Sabrina makes a few arguments with which I whole-heartedly agree:

  1. Error types should be located near to their unit of fallibility (her words)

    That is, crate::Error or errors.rs is an antipattern.

  2. Error types should be layered (my words)

    That is, this:

    FromFileError {
        path: "BadBlocks.txt",
        source: Parse(
            ParseError {
                line: 2,
                source: ParseInt(
                    ParseIntError {
                        kind: InvalidDigit,
                    },
                ),
            },
        ),
    }

    … is better than this:

    Error {
        ParseInt(
          ParseIntError {
              kind: InvalidDigit
          }
       )
    }

    (The Go programmers in the audience are nodding their heads.)

  3. Error types should be extensible (our words?)

    That is, errors and their variants should have a #[non_exhaustive] attribute, making this:

    #[derive(Debug)]
    #[non_exhaustive]
    pub struct ParseError {}

    … is better than this:

    #[derive(Debug)]
    pub struct ParseError {}

I have a few nits; I did say "almost perfectly summarised."

I prefer thiserror

One could use the thiserror crate to shorten the code by using a #[derive(Error)]. Personally, I would never use this for a library crate since it’s really not that many lines of code saved for a whole extra dependency, but you might want to know about it.

Here is the final code from the article, rewritten to use thiserror, with two minor changes that I'll discuss later:

#[derive(Debug, thiserror::Error)]
#[error("failed to download Blocks.txt from the Unicode website")]
#[non_exhaustive]
pub struct DownloadError(#[from] pub DownloadErrorKind);

#[derive(Debug, thiserror::Error)]
#[error(transparent)]
pub enum DownloadErrorKind {
    Request(#[from] Box<ureq::Error>),
    ReadBody(#[from] io::Error),
    Parse(#[from] ParseError),
}

#[derive(Debug, thiserror::Error)]
#[error("error reading `{path}`")]
#[non_exhaustive]
pub struct FromFileError {
    pub path: Box<Path>,
    pub source: FromFileErrorKind,
}

#[derive(Debug, thiserror::Error)]
#[error(transparent)]
pub enum FromFileErrorKind {
    ReadFile(#[from] io::Error),
    Parse(#[from] ParseError),
}

#[derive(Debug, thiserror::Error)]
#[error("invalid Blocks.txt data on line {}", self.line + 1)]
#[non_exhaustive]
pub struct ParseError {
    pub line: usize,
    pub source: ParseErrorKind,
}

#[derive(Debug, thiserror::Error)]
pub enum ParseErrorKind {
    #[error("no semicolon")]
    #[non_exhaustive]
    NoSemicolon,
    #[error("no `..` in range")]
    #[non_exhaustive]
    NoDotDot,
    #[error("one end of range is not a valid hexadecimal integer")]
    #[non_exhaustive]
    ParseInt(#[from] ParseIntError),
}

I believe that when working on a team this is better code. Why?

  1. It is less code

    This is an objective statement (48 < 106) backing a (contentious!) subjective argument:

    • All else being equal, less lines of code is better

    In this case, the boilerplate implementations only serve to obscure the salient details: the error types and error messages.

  2. It is a clear pattern

    Even to write the boilerplate implementations requires knowledge in Rust minutiae, including:

    • That we should implement a Display trait
      • What is a Formatter
      • How does &mut work?
      • What does <'_> mean?
      • What is the difference between a &str and a String?
      • I need to call write! or f.write_str!
    • That we should implement an Error trait
      • What is source for?
      • What is a dyn Error?
      • What is a 'static?
      • Do need match on something?
      • Do I use & or not?

    On a team, there is always someone new or learning.

    To wit, none of these questions are hypothetical; I've mentored colleagues on them and more. (Ask me about redaction and DebugStruct::finish_non_exhaustive! 👋🏾)

    And on my team in specific, our use of thiserror helped our Kotlin and Swift developers fearlessly add features to our Rust code.

  3. It get new features for free

    Rust is a young language, so unsurprisingly error handling is still evolving. The std::error::Error trait documentation has both deprecated and experimental APIs. Meanwhile, thiserror is a popular and well-maintained crate. By relying on it, we get old features removed and new features added without any additional effort.

    One important caveat: our codebase has dependencies. And we willingly (happily!) do the work to stay mainline.

Two minor changes

These truly are nits; entirely stylistic and I'd never argue for them.

nit 1: rename kind to source

In the FromFileError and ParseError structs, I renamed the kind field to source.

  • The name kind obscures what the *Kind types are

    They're Errors! And Errors refer to each other via Error::source.

  • thiserror likes it

    From thiserror's documentation:

    The Error trait’s source() method is implemented to return whichever field has a #[source] attribute or is named source, if any. This is for identifying the underlying lower level error that caused your error.

    In short, we can write:

    pub source: ErrorKind

    … instead of:

    #[source]
    pub kind: ErrorKind

This nit changes the parser code. Instead of:

.map_err(|kind| ParseError { line: i, kind })

… we now write:

.map_err(|source| ParseError { line: i, source })

🤷🏾‍♂️

nit 2: use #[from] liberally

#[from] is a feature of thiserror that does two things:

  1. It implies #[source]

    That is, it makes the #[from] attributed field the source of truth for Error::source.

  2. It implements the From trait

    That is, it lets us write result? instead of result.map_err(ErrorKind)?

Sabrina writes:

One thing notably omitted from the definitions of the new error types was implementations of From for inner types. There is no problem with them really, one just has to be careful that it (a) works with extensibility and (b) actually makes sense. […]

While it does make sense to implement From<ParseError>, because Parse is literally the name of one of the variants of FromFileErrorKind, it does not make sense to implement From<io::Error> because such an implementation would implicitly add meaning that one failed during the process of reading the file from disk (as the variant is named ReadFile instead of Io). Constraining the meaning of “any I/O error” to “an error reading the file from the disk” is helpful but should not be done implicitly, thus rendering From inappropriate.

I completely agree with her notes of caution, but I disagree in working practice.

Let me take this in steps:

  1. I changed ParseInt { source: ParseIntError } to ParseInt(#[from] ParseIntError)

    This let me change the original code from:

    let start = u32::from_str_radix(start, 16)
          .map_err(|source| ParseErrorKind::ParseInt { source })?;
    let end = u32::from_str_radix(end, 16)
          .map_err(|source| ParseErrorKind::ParseInt { source })?;

    … into: (using the implicit source)

    let start = u32::from_str_radix(start, 16)
        .map_err(ParseErrorKind::ParseInt)?;
    let end = u32::from_str_radix(end, 16)
        .map_err(ParseErrorKind::ParseInt)?;

    … and still further into: (using the From trait)

    let start = u32::from_str_radix(start, 16)?;
    let end = u32::from_str_radix(end, 16)?;

    Sabrina's worry, if I understand it correctly, is that someone may later write code that calls from_str_radix(…)? (or one of its relatives) in a different context and the meaning of ParseErrorKind::ParseInt will shift. I generally share this worry! It's happened in our team's codebase as we've iterated on how we define and structure our errors.

    However, in this case, I feel the worry is constrained by the "not the prettiest" pattern of constructing the error type inside a closure. In her example, a FromFileErrorKind can only come from inside one closure. And that closure's responsibility is to read from a file!

    Our ParseInt / u32::from_str_radix uses are inside a similar closure, and a ParseErrorKind can only come said closure.

  2. I changed all the Kind variants to use #[from] YOLO

    I think the improved readability is worth the risk. It lets us go from this:

     pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> {
         let path = path.as_ref();
         (|| {
             Self::from_str(&fs::read_to_string(path).map_err(FromFileErrorKind::ReadFile)?)
                 .map_err(FromFileErrorKind::Parse)
         })()
         .map_err(|source| FromFileError {
             path: path.into(),
             source,
         })
     }
     pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> {
         (|| {
             let response = agent
                 .get(LATEST_URL)
                 .call()
                 .map_err(|e| DownloadErrorKind::Request(Box::new(e)))?;
             Self::from_str(
                 &response
                     .into_string()
                     .map_err(DownloadErrorKind::ReadBody)?,
             )
             .map_err(DownloadErrorKind::Parse)
         })()
         .map_err(DownloadError)
     }

    … to this: 😬

     pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> {
         let path = path.as_ref();
         (|| Ok(Self::from_str(&fs::read_to_string(path)?)?))().map_err(|source| FromFileError {
             path: path.into(),
             source,
         })
     }
     pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> {
         (|| {
             let response = agent.get(LATEST_URL).call().map_err(Box::new)?;
             Ok(Self::from_str(&response.into_string()?)?)
         })()
         .map_err(DownloadError)
     }

    … and finally, to this:

     pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> {
         let path = path.as_ref();
         let from_file = || Ok(Self::from_str(&fs::read_to_string(path)?)?);
         from_file().map_err(|source| FromFileError {
             path: path.into(),
             source,
         })
     }
     pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> {
         let download = || -> Result<_, DownloadErrorKind> {
             let response = agent.get(LATEST_URL).call().map_err(Box::new)?;
             Ok(Self::from_str(&response.into_string()?)?)
         };
         Ok(download()?)
     }

    I've made some stylistic changes here too, the most obvious of which is I assigned the closures to variables. "Namespaces are one honking great idea -- let's do more of those!"

Downsides

  • thiserror compiles slower

    Something to keep in mind when profiling builds. Our build times are dominated by other things. (😤 Docker!)

Backtraces

Rust (like Swift and friends) doesn't have exceptions. That means tracking down where an error was created can be an exercise in frustration. Somedays, I miss having a file name / line number.

Worse, even reading errors can be a mixed bag.

Consider these four tests:

#[test]
fn fail_panic() {
    Blocks::from_file("BadBlocks.txt").unwrap();
}

#[test]
fn fail_result() -> Result<(), FromFileError> {
    Blocks::from_file("BadBlocks.txt")?;
    unreachable!()
}

#[test]
fn fail_test_result() -> TestResult {
    Blocks::from_file("BadBlocks.txt")?;
    unreachable!()
}

#[test]
fn fail_anyhow() -> anyhow::Result<()> {
    Blocks::from_file("BadBlocks.txt")?;
    unreachable!()
}

Here's what cargo test prints:

---- tests::fail_panic stdout ----
thread 'tests::fail_panic' panicked at 'called `Result::unwrap()` on an `Err` value: FromFileError { path: "BadBlocks.txt", source: Parse(ParseError { line: 2, source: ParseInt(ParseIntError { kind: InvalidDigit }) }) }', src/lib.rs:137:44

---- tests::fail_result stdout ----
Error: FromFileError { path: "BadBlocks.txt", source: Parse(ParseError { line: 2, source: ParseInt(ParseIntError { kind: InvalidDigit }) }) }

---- tests::fail_test_result stdout ----
thread 'tests::fail_test_result' panicked at 'error: rust_error_styles::FromFileError - error reading `BadBlocks.txt`', src/lib.rs:148:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- tests::fail_anyhow stdout ----
Error: error reading `BadBlocks.txt`

Caused by:
    0: invalid Blocks.txt data on line 3
    1: one end of range is not a valid hexadecimal integer
    2: invalid digit found in string

None are great.

  • fail_panic is low-key the default for Rust newbies. It shows a Debug representation of the error, but omits the meaningful messages.
  • fail_result is implemented the same way as code under test (read: real code) and doesn't even show where in the test the failure occured!
  • fail_test_result uses TestResult, a crate I only heard about today. It shows the meaningful message, but omits the representation.
  • fail_anyhow uses the infamous anyhow, showing meaningful messages for the entire error chain, but omits the error's representation.

None give a useful backtrace.

I want it all:

  1. The error's representation (preferrably pretty-printed via {:#?})
  2. The error chain's messages
  3. A backtrace of the error chain

I haven't investigated eyre or snafu, which both look like they might do the trick.

Perhaps a PR on testresult to show both the pretty-printed debug representation and error chain's messages is in order?

Conclusion

Upon further research, Alex Fedoseev in thiserror, anyhow, or How I Handle Errors in Rust Apps said everyting I just did, but better.

(lobste.rs thread)

# test comment
0080..00FF; Latin-1 Supplement
00j0..007F; Basic Latin
0000..007F; Basic Latin
0080..00FF; Latin-1 Supplement
0100..017F; Latin Extended-A
0180..024F; Latin Extended-B
0250..02AF; IPA Extensions
[package]
name = "rust-error-styles"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = "1.0.70"
testresult = "0.3.0"
thiserror = "1.0.40"
ureq = "2.6.2"
//! This crate provides types for UCD’s `Blocks.txt`.
#[derive(Debug)]
pub struct Blocks {
ranges: Vec<(RangeInclusive<u32>, String)>,
}
impl Blocks {
pub fn block_of(&self, c: char) -> &str {
self.ranges
.binary_search_by(|(range, _)| {
if *range.end() < u32::from(c) {
cmp::Ordering::Less
} else if u32::from(c) < *range.start() {
cmp::Ordering::Greater
} else {
cmp::Ordering::Equal
}
})
.map(|i| &*self.ranges[i].1)
.unwrap_or("No_Block")
}
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> {
let path = path.as_ref();
let from_file = || Ok(Self::from_str(&fs::read_to_string(path)?)?);
from_file().map_err(|source| FromFileError {
path: path.into(),
source,
})
}
pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> {
let download = || -> Result<_, DownloadErrorKind> {
let response = agent.get(LATEST_URL).call().map_err(Box::new)?;
Ok(Self::from_str(&response.into_string()?)?)
};
Ok(download()?)
}
}
impl FromStr for Blocks {
type Err = ParseError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let ranges = s
.lines()
// I couldn't help myself
.map(|line| line.split_once('#').map(|(line, _)| line).unwrap_or(line))
.enumerate()
.filter(|(_, line)| !line.is_empty())
.map(|(i, line)| {
(|| {
let (range, name) = line.split_once(';').ok_or(ParseErrorKind::NoSemicolon)?;
let (range, name) = (range.trim(), name.trim());
let (start, end) = range.split_once("..").ok_or(ParseErrorKind::NoDotDot)?;
let start = u32::from_str_radix(start, 16)?;
let end = u32::from_str_radix(end, 16)?;
Ok((start..=end, name.to_owned()))
})()
.map_err(|source| ParseError { line: i, source })
})
.collect::<Result<Vec<_>, ParseError>>()?;
Ok(Self { ranges })
}
}
#[derive(Debug, thiserror::Error)]
#[error("failed to download Blocks.txt from the Unicode website")]
#[non_exhaustive]
pub struct DownloadError(#[from] pub DownloadErrorKind);
#[derive(Debug, thiserror::Error)]
#[error(transparent)]
pub enum DownloadErrorKind {
Request(#[from] Box<ureq::Error>),
ReadBody(#[from] io::Error),
Parse(#[from] ParseError),
}
#[derive(Debug, thiserror::Error)]
#[error("error reading `{path}`")]
#[non_exhaustive]
pub struct FromFileError {
pub path: Box<Path>,
pub source: FromFileErrorKind,
}
#[derive(Debug, thiserror::Error)]
#[error(transparent)]
pub enum FromFileErrorKind {
ReadFile(#[from] io::Error),
Parse(#[from] ParseError),
}
#[derive(Debug, thiserror::Error)]
#[error("invalid Blocks.txt data on line {}", self.line + 1)]
#[non_exhaustive]
pub struct ParseError {
pub line: usize,
pub source: ParseErrorKind,
}
#[derive(Debug, thiserror::Error)]
pub enum ParseErrorKind {
#[error("no semicolon")]
#[non_exhaustive]
NoSemicolon,
#[error("no `..` in range")]
#[non_exhaustive]
NoDotDot,
#[error("one end of range is not a valid hexadecimal integer")]
#[non_exhaustive]
ParseInt(#[from] ParseIntError),
}
#[cfg(test)]
mod tests {
#[test]
fn real_unicode() -> TestResult {
let data = Blocks::download(&ureq::agent())?;
assert_eq!(data.block_of('\u{0080}'), "Latin-1 Supplement");
assert_eq!(data.block_of('½'), "Latin-1 Supplement");
assert_eq!(data.block_of('\u{00FF}'), "Latin-1 Supplement");
assert_eq!(data.block_of('\u{EFFFF}'), "No_Block");
Ok(())
}
#[test]
fn test_unicode() -> TestResult {
let data = include_str!("../Blocks.txt").parse::<Blocks>()?;
assert_eq!(data.block_of('\u{0080}'), "Latin-1 Supplement");
assert_eq!(data.block_of('½'), "Latin-1 Supplement");
assert_eq!(data.block_of('\u{00FF}'), "Latin-1 Supplement");
assert_eq!(data.block_of('\u{EFFFF}'), "No_Block");
Ok(())
}
#[test]
fn fail_panic() {
Blocks::from_file("BadBlocks.txt").unwrap();
}
#[test]
fn fail_result() -> Result<(), FromFileError> {
Blocks::from_file("BadBlocks.txt")?;
unreachable!()
}
#[test]
fn fail_test_result() -> TestResult {
Blocks::from_file("BadBlocks.txt")?;
unreachable!()
}
#[test]
fn fail_anyhow() -> anyhow::Result<()> {
Blocks::from_file("BadBlocks.txt")?;
unreachable!()
}
use crate::*;
use testresult::TestResult;
}
pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt";
use std::cmp;
use std::fs;
use std::io;
use std::num::ParseIntError;
use std::ops::RangeInclusive;
use std::path::Path;
use std::str::FromStr;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment