layout	title	date	audience	excerpt
post	the-intra-doc-links-saga	2020-08-08 00:01:14 -0400	developers

N.B. This post assumes some familiarity with the Rust programming language.

Introduction - What are intra-doc links?

One of the cool features of rust is a tool called rustdoc. It can auto-generate documentation from your source code with very little effort on your part. This is the tool behind https://docs.rs.

Intra-doc links are a feature of rustdoc that allow you to link to 'items' - functions, types, and more - by their name, instead of a hard-coded url. This lets you have accurate links even if your types are re-exported in a different module or crate. Here is a simple example:

/// Link to [`f()`]
pub struct S;
pub fn f() {}

The history of intra-doc links

Intra-doc links have been around for a while, all the way back since 2017! Unfortunately, they've been unstable that whole time. The main blocker for a long time was cross-crate re-exports, things like the following:

// inner crate
#![crate_name = "inner"]
/// Link to [`f()`]
pub struct S;
pub fn f() {}

// outer crate
pub use inner::S;

These links were the original motivation for intra-doc links, so if we couldn't get them working, there wasn't much point in stabilizing! They also had the downside that they could silently break - the documentation would work when you built it, but any user of your API could re-export your types and cause the links to be broken.

What changed?

Early in June, I got tired of not being able to use intra-doc links. I started investigating the issue to see if there was a fix. It was marked as E-hard, so I wasn't expecting miracles, but I thought I might at least make a start on it.

It turns out there was a simple problem with the implementation - it assumed all items were in the current crate! Clearly, that's not always the case. The fix turned out to be easy enough that I could implement it as my first contribution to rustdoc.

However, it had one small problem: on certain carefully crafted inputs, it would crash:

#![feature(decl_macro)]
fn main() {
    || {
        macro m() {}
    };
}
thread 'rustc' panicked at 'called `Option::unwrap()` on a `None` value', /home/joshua/src/rust/src/librustc_hir/definitions.rs:358:9

HirIds and DefIds and trees, oh my!

(If you're not interested in the internals of the rust compiler, feel free to skip this section.)

The error above came because of a pass called everybody_loops. A compiler 'pass' is a transformation on the source code, for example finding items without documentation. The everybody_loops pass turns the above code into

fn main() {
    {
        macro m { () => { } }
    }
    loop  { }
}

As part of my changes for resolving cross-crate items, I needed to know the first parent module, so I could tell what items were in scope. Note however, that after everybody_loops the closure has disappeared! The crash happened because rustdoc was trying to access a closure that rustc didn't think existed (in compiler jargon, it was turning the DefId for the closure, which works across crates, into a HirId, which is specific to the current crate but contains a lot more info).

Why is this hard?

This turned out to be an enormous rabbit hole. everybody_loops was introduced all the way back in 2017 to solve another long-standing issue: rustdoc doesn't know how to deal with conditional compilation. What it lets rustdoc (and by extension, the standard library) do is ignore type and name errors in function bodies. This allows documenting both Linux and Windows APIs on the same host, even though the implementations would normally be broken. As seen above, the way it works is by turning every function body into loop {} - this is always valid, because loop {} has type !, which coerces to any type!

As we saw above, though, this transformation broke rustdoc. Additionally, it was causing lots of other problems.

So I got rid of it! This was Don't run everybody_loops. It is the single largest PR I've ever made to rustc, and hopefully the largest I will ever make. The issue was that the errors from libstd haven't gone away - if anything, it had been expanded since 2017. The hack I came up with was to, instead of running type checking and trying to rewrite the code into something that was valid, never run type checking at all! This is both less work and closer to the semantics rustdoc wants. In particular, it never causes the invalid states that were crashing rustdoc.

Aftermath: No good deed goes unpunished

About a month after the PR was merged, rustdoc got a bug report: the docs for async-std failed to build on the nightly channel. Their code looked something like the following:

mod windows {
    pub trait WinFoo {
        fn foo(&self) {}
    }
    impl WinFoo for () {}
}

#[cfg(any(windows, doc))]
use windows::*;

mod unix {
    pub trait UnixFoo {
        fn foo(&self) {}
    }
    impl UnixFoo for () {}
}

#[cfg(any(unix, doc))]
use unix::*;

async fn bar() {
    ().foo()
}

In particular, notice that under cfg(doc), both traits would be in scope with the same method, so it would be ambiguous which to use for .foo(). This is exactly the sort of problem meant to be solved by not running type-checking. Unfortunately, since it was used in an async fn, type checking was still being run; bar desugars to a closure of the following form:

fn bar() -> impl Future<Output = ()> {
    async {
        ().foo()
    }
}

Because the function returned impl Future, that required type-checking the body to infer the return type of the function. That's exactly what rustdoc wanted not to do!

The hacky 'fix' implemented was to not infer the type of the function at all - rustdoc doesn't care about the exact type, only the traits that it implements.

Stabilizing intra-doc links

Now that cross-crate re-exports work, there isn't much standing in the way of stabilizing intra-doc links! There are a few cleanup PRs, but for the most part, the path to stabilization seems clear.

jyn514/intra-link-post-draft.md