layout | title | date | audience | excerpt |
---|---|---|---|---|
post |
the-intra-doc-links-saga |
2020-08-08 00:01:14 -0400 |
developers |
N.B. This post assumes some familiarity with the Rust programming language.
One of the cool features of rust is a tool called rustdoc
. It can
auto-generate documentation from your source code with very little effort on
your part. This is the tool behind https://docs.rs.
Intra-doc links are a feature of rustdoc
that allow you to link to
'items' - functions, types, and more - by their name, instead of a
hard-coded url. This lets you have accurate links even if your types are
re-exported in a different module or crate. Here is a
simple example:
/// Link to [`f()`]
pub struct S;
pub fn f() {}
Intra-doc links have been around for a while, all the way back since 2017! Unfortunately, they've been unstable that whole time. The main blocker for a long time was cross-crate re-exports, things like the following:
// inner crate
#![crate_name = "inner"]
/// Link to [`f()`]
pub struct S;
pub fn f() {}
// outer crate
pub use inner::S;
These links were the original motivation for intra-doc links, so if we couldn't get them working, there wasn't much point in stabilizing! They also had the downside that they could silently break - the documentation would work when you built it, but any user of your API could re-export your types and cause the links to be broken.
Early in June, I got tired of not being able to use intra-doc links. I
started investigating the issue to see if there was a fix. It was marked as
E-hard
, so I wasn't expecting miracles, but I thought I might at least
make a start on it.
It turns out there was a simple problem with the implementation - it assumed all items were in the current crate! Clearly, that's not always the case. The fix turned out to be easy enough that I could implement it as my first contribution to rustdoc.
However, it had one small problem: on certain carefully crafted inputs, it would crash:
#![feature(decl_macro)]
fn main() {
|| {
macro m() {}
};
}
thread 'rustc' panicked at 'called `Option::unwrap()` on a `None` value', /home/joshua/src/rust/src/librustc_hir/definitions.rs:358:9
(If you're not interested in the internals of the rust compiler, feel free to skip this section.)
The error above came because of a pass called everybody_loops
.
A compiler 'pass' is a transformation on the source code, for example
finding items without documentation.
The everybody_loops
pass turns the above code into
fn main() {
{
macro m { () => { } }
}
loop { }
}
As part of my changes for resolving cross-crate items, I needed to know the
first parent module, so I could tell what items were in scope. Note however,
that after everybody_loops
the closure has disappeared! The crash happened
because rustdoc
was trying to access a closure that rustc
didn't think
existed (in compiler jargon, it was turning the DefId
for the closure,
which works across crates, into a HirId
, which is specific to the current
crate but contains a lot more info).
This turned out to be an enormous rabbit hole. everybody_loops
was
introduced all the way back in 2017 to solve another long-standing issue:
rustdoc
doesn't know how to deal with conditional compilation.
What it lets rustdoc (and by extension, the standard library) do is ignore type and name errors
in function bodies. This allows documenting both Linux and Windows APIs on the same host,
even though the implementations would normally be broken. As seen above, the way it works
is by turning every function body into loop {}
- this is always valid, because
loop {}
has type !
, which coerces to any type!
As we saw above, though, this transformation broke rustdoc. Additionally, it was causing lots of other problems.
So I got rid of it! This was Don't run everybody_loops. It is the single
largest PR I've ever made to rustc, and hopefully the largest I will ever
make. The issue was that the errors from libstd haven't gone away - if anything,
it had been expanded since 2017. The hack I came up with was to, instead of
running type checking and trying to rewrite the code into something that was valid,
never run type checking at all! This is both less work and closer
to the semantics rustdoc wants. In particular, it never causes the invalid states
that were crashing rustdoc
.
About a month after the PR was merged, rustdoc got a bug report: the docs for
async-std
failed to build on the nightly channel. Their code looked something like the following:
mod windows {
pub trait WinFoo {
fn foo(&self) {}
}
impl WinFoo for () {}
}
#[cfg(any(windows, doc))]
use windows::*;
mod unix {
pub trait UnixFoo {
fn foo(&self) {}
}
impl UnixFoo for () {}
}
#[cfg(any(unix, doc))]
use unix::*;
async fn bar() {
().foo()
}
In particular, notice that under cfg(doc)
, both traits would be in scope
with the same method, so it would be ambiguous which to use for .foo()
.
This is exactly the sort of problem meant to be solved by not running type-checking.
Unfortunately, since it was used in an async fn
, type checking was still being run;
bar
desugars to a closure of the following form:
fn bar() -> impl Future<Output = ()> {
async {
().foo()
}
}
Because the function returned impl Future
, that required type-checking the
body to infer the return type of the function. That's exactly what rustdoc
wanted not to do!
The hacky 'fix' implemented was to not infer the type of the function at all - rustdoc doesn't care about the exact type, only the traits that it implements.
Now that cross-crate re-exports work, there isn't much standing in the way of stabilizing intra-doc links! There are a few cleanup PRs, but for the most part, the path to stabilization seems clear.
I think it's better to not mention
javadoc
, or any other tool.Missing "use".
Might be interesting to explain what a compiler pass is. I assume most readers don't know about it.
Missing a link?