This was an experiment in seeing how feasible it would be to distribute crates on IPFS using the alternative registries feature combined with a local IPFS web gateway.
There was very little plan for this originally, and I wish I had kept more of
the intermediate states as I went through multiple major design changes. My
original goal was to publish my CLI utility bs58-cli
and its dependency
tree.
Given the lack of plan, and the somewhat tunnel-visioned solution that evolved,
this is definitely not an optimal way of publishing crates to IPFS. It's very
much just an exploration into one possibility, and there should be more
explorations of other ways to do so. One idea that is likely much more suited
to my original goal is to utilize cargo-vendor
somehow, that would allow
publishing a single source distribution that it should be possible to build
directly from, rather than this multi-level thing with registries. Another idea
that is more suited to actual development is to utilize IPNS to allow publishing
multiple versions of crates into a single registry, either per-crate (or
collection of crates) registries, or a more global registry similar to
https://crates.io.
Because Cargo only supports the "smart" git-http protocol you have to tell it to
use the git
CLI tool instead, by editing your ~/.cargo/config
:
[net]
git-fetch-with-cli = true
Then you can install bs58-cli
from IPFS distributed sources:
> cargo install bs58-cli --index http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y
Updating `http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y` index
Downloaded bs58-cli v0.1.0 (registry `http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y`)
[...]
Installed package `bs58-cli v0.1.0 (registry `http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y`)` (executable `bs58`)
(I suggest pinning QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y
and waiting
for the data to be fetched before running cargo install
).
(For pre-1.46 Cargo you will have to add the registry into ~/.cargo/config
and
use --registry
instead of --index
).
The current design takes in a single root crate, at a specific version, precomputes the set of dependencies required to build just this crate, then recursively (with caching):
- Removes dev-dependencies
- Removes references to dependencies not in the precomputed set
- Runs itself on remaining dependencies to generate their registry index
- Updates the current crate to depend on each dependency via its generated registry index
- Publishes the current crate
- Links published crate to indexes of each dependency
- Generates and publishes a registry index for this crate
- Link registry index to the published crate
This results in a single IPFS node that requires pinning to keep the entire tree alive, but internally the objects reference each other via direct web gateway URLs to the subobjects.
When removing references to dependencies not in the precomputed set this is only done by name. This can result in having references to different versions of the dependency that should not be included. Fixing this would probably require implementing some semver request matching to verify that the dependency request matches one of the allowed versions.
The initial design I attempted to build was to just have a pair of published directories: all of the crates involved in the build, and an index for them.
This ran into a series of issues that eventually found the design transformed into the one here.
The first major issue was that having an index containing crates referencing
each other means that the index and the Cargo.toml
in the crate archives end
up needing to reference each other.
The registry index contains two things: a config.json
specifying a template
URL for where to download the crate archives from, and a set of metadata files
specifying crates available from this registry. Given that I was using the IPFS
web gateway to generate download URLs compatible with Cargo I needed to create a
single IPFS directory containing all the crates and put that into the download
URL template like http://127.0.0.1/<hash>/{name}-{version}.crate
(the name
and version
parameters will be filled in by Cargo when downloading the crate
archive).
The crate archives must each contain a Cargo.toml
describing the crate in it.
This includes all dependencies for the crate, specified via a version and
registry. The registry is specified via a (AFAIK) undocumented registry-index
property containing the URL of the git repo containing the index this dependency
is in. If this is missing then it is assumed to be the default crates.io
registry. Again, given that I am publishing the git repos via the IPFS web
gateway I need to put a URL like http://127.0.0.1/<hash>
into this URL.
- I want to publish the index.
- To generate the hash for the download URL I need to first package all the crate archives and publish them.
- To generate the hash for the
registry-index
of the dependencies I need to first publish the index. - Goto 1
My solution here was to publish a single crate per registry. This means I can walk a full dependency DAG publishing each crates archive, then its index containing a hash of the archive, then its dependents archives containing the hashes of the dependencies indexes.
The per-crate registry solution above depends on one thing: having a DAG for the
entire set of crates to publish. Unfortunately for any non-trivial project this
is unlikely to happen because you will commonly have crates that mutually depend
on each other as dev-dependencies. One of the first places you'll run into this
is proc-macro2
+ quote
, quote
is built on top of proc-macro2
and is
invaluable when working with proc-macro2
; so invaluable in-fact, that
proc-macro2
includes an example of using quote
in its documentation,
requiring it to pull quote
in as a dev-dependency to test the example is
correct.
For Cargo this is ok, you can construct a build DAG containing
proc-macro2(examples) -> quote -> proc-macro2
because the example and the
library are different build targets. For publishing content-addressed source
code this doesn't work, the proc-macro2
examples are part of the same source
code as the library, so need to be a single crate.
So we just strip all dev-dependencies, nobody cares about running tests from published code anyway (*cough* except those that do).
After getting rid of dev-dependencies we still don't have a DAG of crates. Unfortunately it's possible for crates to have loops where they optionally depend on each other. Normally this is fine because you don't activate all the optional deps that result in a loop appearing in a project, but I found one example where activating just a few features gave a loop detected by Cargo:
[dependencies]
clap = "2.33.1"
textwrap = { version = "0.11.0", features = ["hyphenation"] }
num-traits = { version = "0.2.11", features = ["libm"] }
libm = { version = "0.2.1", features = ["rand"] }
rand = { version = "0.6.5", features = ["packed_simd"] }
packed_simd = { version = "0.3.3", features = ["sleef-sys"] }
error: cyclic package dependency: package `atlatl v0.1.2` depends on itself.
Cycle:
package `atlatl v0.1.2`
... which is depended on by `hyphenation v0.7.1`
... which is depended on by `textwrap v0.11.0`
... which is depended on by `clap v2.33.1`
... which is depended on by `bindgen v0.46.0`
... which is depended on by `sleef-sys v0.1.2`
... which is depended on by `packed_simd v0.3.3`
... which is depended on by `rand v0.6.5`
... which is depended on by `libm v0.2.1`
... which is depended on by `num-traits v0.2.11`
... which is depended on by `atlatl v0.1.2`
(Though I think this may still be a false-positive, the sleef-sys -> bindgen
dependency link is a build-dependency, so the textwrap@host
shouldn't
have the hyphenation
feature active. Playing around with -Zfeatures
I wasn't
able to get this successfully built though.)
Whether or not this should be a valid set of dependencies for a build, it still makes the references at the source level a cyclic graph that we can't publish.
The solution here is to take the resolved build graph from the top-level crate we're attempting to publish and use that to restrict the allowed dependencies. As we recurse down we strip all optional dependencies that weren't actually activated in the graph (and adjust feature sets by removing all activation of the optional dependencies features, and injecting a dummy feature in case another crate referenced the optional dependency).