Skip to content

Instantly share code, notes, and snippets.

@derekmorr
Last active September 19, 2018 02:42
Show Gist options
  • Save derekmorr/efe8f8a443f6a1ad1ed1173fb6f73411 to your computer and use it in GitHub Desktop.
Save derekmorr/efe8f8a443f6a1ad1ed1173fb6f73411 to your computer and use it in GitHub Desktop.
opening_a_file.md

Opening a file

A comparison of error handling strategies in a few popular languages. We look at how opening a file is handled as a specific example.

C

The open(2) API is the POSIX standard for opening files in C.

int open(const char *path, int oflag, ...);

This is a very old API, dating from Version 6 AT&T Unix, released in 1975.

The return value is an integer, which is really an offset into the process' file descriptor table in the kernel. Non-negative values indicate success; a negative value indicates failure.

There are several issues with this API:

  • It suffers from the semipredicate problem. There's a special hard-coded value, -1, for errors. If the programmer doesn't check for it, the compiler will still compile your code. Thus, safe coding requires using a linters or manual code review.
  • It also suffers from primitive obsession. File handles are not simple integers.

Go

Go bills itself as a modern successor to C, but it's OpenFile API has several design flaws.

func OpenFile(name string, flag int, perm FileMode) (*File, error)

Functions in Go typically return a pair of values, a success value and an error value (the *File, error thing). At first blush, this appears better than C's open(). It separates success and failure values into distinct types, avoiding the semipredicate problem. However it introduces new problems -- it makes illegal states representable, and it requires null.

Yaron Minsky of Jane Street coined the phrase "make illegal states unrepresentable." It's a way of reducing bugs by designing illegal states out of the system. Go's approach of returning a (success value, error value) pair violates this principle.

Opening a file will either succeed or fail. Always one, never both. If it succeeds, OpenFile() has a success value (a *File). If it fails, it has an error value. But Go's design requires OpenFile() to return two values (a success value and an error value) when it only has one. If OpenFile succeeds, it has a File handle, so it can populate the first element of the pair, but it still needs to fill the second element. So it uses null, or nil, as Go calls it. Thus, Go makes the billion dollar mistake.

A successful call to OpenFile() results in this pair: (*File, nil)

An unsuccessful call to OpenFile() results in this pair: (nil, error)

But there are illegal states here. What about the pair (*File, error) or (nil, nil). What would these states even mean? That the function both succeeded and failed? That it neither succeeded or failed? Go's standard library promises that it won't ever return a value like this, but there's nothing in the language design that prevents it. In fact, returning multiple non-nil values is explicitly supported in Go.

Erlang

Erlang's file.open/2 API has several improvements compared to Go. This is ironic, as it predates Go by more than 20 years.

open(File, Modes) -> {ok, IoDevice} | {error, Reason}

open/2 returns a tuple with two elements (Erlang uses{ } to denote tuples). The first element is either ok or error. These are symbols, which you can think of like a constant, that indicates if the function succeeded or failed. The second element is either the success or failure value.

This is much better structure than Go. We don't need null, and we avoid illegal representations. Remember, opening a file will always either succeed or fail. Always one, never both. A problem with Go was that it allowed illegal states (such as opening a file both succeeding and failing, or neither succeeding or failing). Erlang's design disallows those illegal states.

The function returns a 2-element tuple in one of two forms: { ok, IoDevice} or {error, Reason}. The first element in the tuple is a constant that indicates success or failure. The second provides the success or failure value. Opening a file will always succeed or fail, and the API only requires one value. Thus, we don't have illegal states.

If the open call succeeds, we have a file handle (which Erlang calls IoDevice), and we do not have an error (since the call succeeded). The API only requires us to populate a success value, so we don't need null (as opposed to Go). Similarly, if the open call fails, we have an error value, and the API only requires us to populate that.

This is a well designed API. There is no semipredicate problem. It does not require null. The design of Erlang's result tuples doesn't allow illegal states, but this is only by convention. As Erlang is dynamically typed, the language doesn't preclude the possibility of illegal states.

Rust

Erlang's open/2 is an improvement over Go, but arguably still has some issues. As Erlang is dynamically typed, it's hard to claim that illegal states are unrepresentable. The standard library claims it will always return a valid tuple, but the language doesn't require this.

Enter Rust.

Rust has an ML-inspired error-handling system. It's core error handling mechanism is called Result, which is either Ok or Err. Syntactically, it looks like an enum:

enum Result<T, E> {
    Ok(T),
    Err(E),
}

Result has a similar design to Erlang's tuple -- a result is either Ok or an error. If it's Ok it has a success value (like a file handle), or more generically type T. If it's an error, it has additional error data of type E.

Rust's std::fs::File::open method returns a Result<File>. This design solves all of the problems we've seen. There are distinct return types for success and failure; the compiler enforces error handling (if the programmer forgets to check for errors, the code won't compile); and illegal states are unrepresentable. The unrepsentable-ness is enforced by the language design not by convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment