Skip to content

Instantly share code, notes, and snippets.

@jyn514
Last active August 8, 2020 14:59
Show Gist options
  • Save jyn514/cdce6e9ecd26c379bb06602f69956cf1 to your computer and use it in GitHub Desktop.
Save jyn514/cdce6e9ecd26c379bb06602f69956cf1 to your computer and use it in GitHub Desktop.
layout title date audience excerpt
post
the-intra-doc-links-saga
2020-08-08 00:01:14 -0400
developers

N.B. This post assumes some familiarity with the Rust programming language.

Introduction - What are intra-doc links?

One of the cool features of rust is a tool called rustdoc. It can auto-generate documentation from your source code with very little effort on your part. This is the tool behind https://docs.rs.

Intra-doc links are a feature of rustdoc that allow you to link to 'items' - functions, types, and more - by their name, instead of a hard-coded url. This lets you have accurate links even if your types are re-exported in a different module or crate. Here is a simple example:

/// Link to [`f()`]
pub struct S;
pub fn f() {}

The history of intra-doc links

Intra-doc links have been around for a while, all the way back since 2017! Unfortunately, they've been unstable that whole time. The main blocker for a long time was cross-crate re-exports, things like the following:

// inner crate
#![crate_name = "inner"]
/// Link to [`f()`]
pub struct S;
pub fn f() {}
// outer crate
pub use inner::S;

These links were the original motivation for intra-doc links, so if we couldn't get them working, there wasn't much point in stabilizing! They also had the downside that they could silently break - the documentation would work when you built it, but any user of your API could re-export your types and cause the links to be broken.

What changed?

Early in June, I got tired of not being able to use intra-doc links. I started investigating the issue to see if there was a fix. It was marked as E-hard, so I wasn't expecting miracles, but I thought I might at least make a start on it.

It turns out there was a simple problem with the implementation - it assumed all items were in the current crate! Clearly, that's not always the case. The fix turned out to be easy enough that I could implement it as my first contribution to rustdoc.

However, it had one small problem: on certain carefully crafted inputs, it would crash:

#![feature(decl_macro)]
fn main() {
    || {
        macro m() {}
    };
}
thread 'rustc' panicked at 'called `Option::unwrap()` on a `None` value', /home/joshua/src/rust/src/librustc_hir/definitions.rs:358:9

HirIds and DefIds and trees, oh my!

(If you're not interested in the internals of the rust compiler, feel free to skip this section.)

The error above came because of a pass called everybody_loops. A compiler 'pass' is a transformation on the source code, for example finding items without documentation. The everybody_loops pass turns the above code into

fn main() {
    {
        macro m { () => { } }
    }
    loop  { }
}

As part of my changes for resolving cross-crate items, I needed to know the first parent module, so I could tell what items were in scope. Note however, that after everybody_loops the closure has disappeared! The crash happened because rustdoc was trying to access a closure that rustc didn't think existed (in compiler jargon, it was turning the DefId for the closure, which works across crates, into a HirId, which is specific to the current crate but contains a lot more info).

Why is this hard?

This turned out to be an enormous rabbit hole. everybody_loops was introduced all the way back in 2017 to solve another long-standing issue: rustdoc doesn't know how to deal with conditional compilation. What it lets rustdoc (and by extension, the standard library) do is ignore type and name errors in function bodies. This allows documenting both Linux and Windows APIs on the same host, even though the implementations would normally be broken. As seen above, the way it works is by turning every function body into loop {} - this is always valid, because loop {} has type !, which coerces to any type!

As we saw above, though, this transformation broke rustdoc. Additionally, it was causing lots of other problems.

So I got rid of it! This was Don't run everybody_loops. It is the single largest PR I've ever made to rustc, and hopefully the largest I will ever make. The issue was that the errors from libstd haven't gone away - if anything, it had been expanded since 2017. The hack I came up with was to, instead of running type checking and trying to rewrite the code into something that was valid, never run type checking at all! This is both less work and closer to the semantics rustdoc wants. In particular, it never causes the invalid states that were crashing rustdoc.

Aftermath: No good deed goes unpunished

About a month after the PR was merged, rustdoc got a bug report: the docs for async-std failed to build on the nightly channel. Their code looked something like the following:

mod windows {
    pub trait WinFoo {
        fn foo(&self) {}
    }
    impl WinFoo for () {}
}

#[cfg(any(windows, doc))]
use windows::*;

mod unix {
    pub trait UnixFoo {
        fn foo(&self) {}
    }
    impl UnixFoo for () {}
}

#[cfg(any(unix, doc))]
use unix::*;

async fn bar() {
    ().foo()
}

In particular, notice that under cfg(doc), both traits would be in scope with the same method, so it would be ambiguous which to use for .foo(). This is exactly the sort of problem meant to be solved by not running type-checking. Unfortunately, since it was used in an async fn, type checking was still being run; bar desugars to a closure of the following form:

fn bar() -> impl Future<Output = ()> {
    async {
        ().foo()
    }
}

Because the function returned impl Future, that required type-checking the body to infer the return type of the function. That's exactly what rustdoc wanted not to do!

The hacky 'fix' implemented was to not infer the type of the function at all - rustdoc doesn't care about the exact type, only the traits that it implements.

Stabilizing intra-doc links

Now that cross-crate re-exports work, there isn't much standing in the way of stabilizing intra-doc links! There are a few cleanup PRs, but for the most part, the path to stabilization seems clear.

@GuillaumeGomez
Copy link

It can auto-generate documentation from your source code with very little effort on your part: imagine javadoc but easier to use and with markdown syntax instead of raw HTML.

I think it's better to not mention javadoc, or any other tool.

Early in June, I got tired of not being able to intra-doc links.

Missing "use".

The error above came because of a pass called everybody_loops.

Might be interesting to explain what a compiler pass is. I assume most readers don't know about it.

even though the implementations would [normally be broken].

Missing a link?

@jyn514
Copy link
Author

jyn514 commented Aug 8, 2020

Updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment