Skip to content

Instantly share code, notes, and snippets.

@Nemo157
Last active March 26, 2024 20:46
Show Gist options
  • Save Nemo157/d9efafac719ead283282d2346f018987 to your computer and use it in GitHub Desktop.
Save Nemo157/d9efafac719ead283282d2346f018987 to your computer and use it in GitHub Desktop.
Publishing crates to IPFS

Publishing crates to IPFS

This was an experiment in seeing how feasible it would be to distribute crates on IPFS using the alternative registries feature combined with a local IPFS web gateway.

There was very little plan for this originally, and I wish I had kept more of the intermediate states as I went through multiple major design changes. My original goal was to publish my CLI utility bs58-cli and its dependency tree.

Given the lack of plan, and the somewhat tunnel-visioned solution that evolved, this is definitely not an optimal way of publishing crates to IPFS. It's very much just an exploration into one possibility, and there should be more explorations of other ways to do so. One idea that is likely much more suited to my original goal is to utilize cargo-vendor somehow, that would allow publishing a single source distribution that it should be possible to build directly from, rather than this multi-level thing with registries. Another idea that is more suited to actual development is to utilize IPNS to allow publishing multiple versions of crates into a single registry, either per-crate (or collection of crates) registries, or a more global registry similar to https://crates.io.

Example

Because Cargo only supports the "smart" git-http protocol you have to tell it to use the git CLI tool instead, by editing your ~/.cargo/config:

[net]
git-fetch-with-cli = true

Then you can install bs58-cli from IPFS distributed sources:

> cargo install bs58-cli --index http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y
    Updating `http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y` index
  Downloaded bs58-cli v0.1.0 (registry `http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y`)
[...]
   Installed package `bs58-cli v0.1.0 (registry `http://127.0.0.1:8080/ipfs/QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y`)` (executable `bs58`)

(I suggest pinning QmTo7iPFpM4T961H6mqWdHWYyUUMNbfcQop4YwmPXnQx6Y and waiting for the data to be fetched before running cargo install).

(For pre-1.46 Cargo you will have to add the registry into ~/.cargo/config and use --registry instead of --index).

Current design

The current design takes in a single root crate, at a specific version, precomputes the set of dependencies required to build just this crate, then recursively (with caching):

  1. Removes dev-dependencies
  2. Removes references to dependencies not in the precomputed set
  3. Runs itself on remaining dependencies to generate their registry index
  4. Updates the current crate to depend on each dependency via its generated registry index
  5. Publishes the current crate
  6. Links published crate to indexes of each dependency
  7. Generates and publishes a registry index for this crate
  8. Link registry index to the published crate

This results in a single IPFS node that requires pinning to keep the entire tree alive, but internally the objects reference each other via direct web gateway URLs to the subobjects.

Potential bugs

When removing references to dependencies not in the precomputed set this is only done by name. This can result in having references to different versions of the dependency that should not be included. Fixing this would probably require implementing some semver request matching to verify that the dependency request matches one of the allowed versions.

Previous failures

The initial design I attempted to build was to just have a pair of published directories: all of the crates involved in the build, and an index for them.

This ran into a series of issues that eventually found the design transformed into the one here.

Mutual references

The first major issue was that having an index containing crates referencing each other means that the index and the Cargo.toml in the crate archives end up needing to reference each other.

The registry index contains two things: a config.json specifying a template URL for where to download the crate archives from, and a set of metadata files specifying crates available from this registry. Given that I was using the IPFS web gateway to generate download URLs compatible with Cargo I needed to create a single IPFS directory containing all the crates and put that into the download URL template like http://127.0.0.1/<hash>/{name}-{version}.crate (the name and version parameters will be filled in by Cargo when downloading the crate archive).

The crate archives must each contain a Cargo.toml describing the crate in it. This includes all dependencies for the crate, specified via a version and registry. The registry is specified via a (AFAIK) undocumented registry-index property containing the URL of the git repo containing the index this dependency is in. If this is missing then it is assumed to be the default crates.io registry. Again, given that I am publishing the git repos via the IPFS web gateway I need to put a URL like http://127.0.0.1/<hash> into this URL.

  1. I want to publish the index.
  2. To generate the hash for the download URL I need to first package all the crate archives and publish them.
  3. To generate the hash for the registry-index of the dependencies I need to first publish the index.
  4. Goto 1

Per-crate registry

My solution here was to publish a single crate per registry. This means I can walk a full dependency DAG publishing each crates archive, then its index containing a hash of the archive, then its dependents archives containing the hashes of the dependencies indexes.

Dev-dependency loops

The per-crate registry solution above depends on one thing: having a DAG for the entire set of crates to publish. Unfortunately for any non-trivial project this is unlikely to happen because you will commonly have crates that mutually depend on each other as dev-dependencies. One of the first places you'll run into this is proc-macro2 + quote, quote is built on top of proc-macro2 and is invaluable when working with proc-macro2; so invaluable in-fact, that proc-macro2 includes an example of using quote in its documentation, requiring it to pull quote in as a dev-dependency to test the example is correct.

For Cargo this is ok, you can construct a build DAG containing proc-macro2(examples) -> quote -> proc-macro2 because the example and the library are different build targets. For publishing content-addressed source code this doesn't work, the proc-macro2 examples are part of the same source code as the library, so need to be a single crate.

Nobody builds tests from published crates anyway

So we just strip all dev-dependencies, nobody cares about running tests from published code anyway (*cough* except those that do).

Optional-dependency loops

After getting rid of dev-dependencies we still don't have a DAG of crates. Unfortunately it's possible for crates to have loops where they optionally depend on each other. Normally this is fine because you don't activate all the optional deps that result in a loop appearing in a project, but I found one example where activating just a few features gave a loop detected by Cargo:

[dependencies]
clap = "2.33.1"
textwrap = { version = "0.11.0", features = ["hyphenation"] }
num-traits = { version = "0.2.11", features = ["libm"] }
libm = { version = "0.2.1", features = ["rand"] }
rand = { version = "0.6.5", features = ["packed_simd"] }
packed_simd = { version = "0.3.3", features = ["sleef-sys"] }
error: cyclic package dependency: package `atlatl v0.1.2` depends on itself.
Cycle:
package `atlatl v0.1.2`
... which is depended on by `hyphenation v0.7.1`
... which is depended on by `textwrap v0.11.0`
... which is depended on by `clap v2.33.1`
... which is depended on by `bindgen v0.46.0`
... which is depended on by `sleef-sys v0.1.2`
... which is depended on by `packed_simd v0.3.3`
... which is depended on by `rand v0.6.5`
... which is depended on by `libm v0.2.1`
... which is depended on by `num-traits v0.2.11`
... which is depended on by `atlatl v0.1.2`

(Though I think this may still be a false-positive, the sleef-sys -> bindgen dependency link is a build-dependency, so the textwrap@host shouldn't have the hyphenation feature active. Playing around with -Zfeatures I wasn't able to get this successfully built though.)

Whether or not this should be a valid set of dependencies for a build, it still makes the references at the source level a cyclic graph that we can't publish.

Restrict dependencies to those required

The solution here is to take the resolved build graph from the top-level crate we're attempting to publish and use that to restrict the allowed dependencies. As we recurse down we strip all optional dependencies that weren't actually activated in the graph (and adjust feature sets by removing all activation of the optional dependencies features, and injecting a dummy feature in case another crate referenced the optional dependency).

#!/usr/bin/env zsh
# This requires a custom version of `rq` that you can install via
#
# cargo install record-query --git https://github.com/Nemo157/rq --branch tables-last
#
# Other than that it requires some utilities you should get from your package
# manager:
#
# - `jq`
# - `ipfs` (and a running daemon)
# - `tar`
# - `git`
set -euo pipefail
export seen=${seen:-$(mktemp /tmp/ipfs-registry.seen.XXXXXX)}
export crate=${1:?crate}
export version=${2:?version}
export IPFS_GATEWAY=${IPFS_GATEWAY:-http://127.0.0.1:8080/}
# Track the root crate we're publishing
if ! test -v root
then
export root=$crate-$version
export chain=$root
export cache_root=cache/${IPFS_GATEWAY//[^0-9a-zA-Z]/_}/$root
mkdir -p $cache_root
mkdir -p crates
echo "Starting publish for $crate-$version and dependencies to $IPFS_GATEWAY" >&2
else
export chain=${chain}:$crate-$version
fi
trap '
if test $? -ne 0
then
echo "Failed during $chain" >&2
fi
' EXIT
debug() {
if test ${DEBUG:-0} -eq 1
then
"$@" >&2
else
"$@" >&/dev/null
fi
}
cache=$cache_root/$crate-$version
# If we've already rewritten this crate, return the cached hash, caching must be
# per-root since the root determines allowed deps, and per-gateway
if test -e $cache.index
then
hash=$(cat $cache.index)
else
# If we've already visited this crate this run, and it was _not_ cached, we have
# a loop
if grep -Fxq $crate-$version $seen
then
echo "Loop detected at $chain" >&2
exit 1
fi
echo $crate-$version >> $seen
echo "Rewriting $crate-$version" >&2
# Check if we've already downloaded this crate to the cache
if ! test -e crates/$crate-$version.tar.gz
then
echo "Downloading $crate-$version" >&2
debug cargo download -o crates/$crate-$version.tar.gz $crate==$version
fi
# Extract sources so we can find all the dependencies
tmp=$(mktemp -d /tmp/ipfs-registry.$crate-$version.XXXXXX)
tar xzf crates/$crate-$version.tar.gz -C $tmp
# Remove dev-dependencies because they always involve a dependency loop
mv $tmp/$crate-$version/Cargo.toml $tmp/$crate-$version/Cargo.toml.orig2
cat $tmp/$crate-$version/Cargo.toml.orig2 \
| rq --input-toml --output-json \
| jq '
. as $root
| ([(.["dev-dependencies"] // {}) | keys | .[]]) as $devdeps
| $root
| .features = (
(.features // {})
| map_values([
.[]
| split("/")[0] as $dep
| . as $feat
| select(
([$devdeps[] | . == $dep] | any) == false
)
])
)
| del(.["dev-dependencies"])
| .target = (
(.target // {})
| map_values(
del(.["dev-dependencies"])
)
)
' \
| rq --input-json --output-toml \
> $tmp/$crate-$version/Cargo.toml
rm -f $tmp/$crate-$version/Cargo.lock
debug cargo update --manifest-path $tmp/$crate-$version/Cargo.toml
# If this is the root crate collect all allowed dependencies
if test $root = $crate-$version
then
export allowed="$(set -e; cargo metadata \
--all-features \
--format-version 1 \
--manifest-path $tmp/$crate-$version/Cargo.toml \
| jq -r '.packages[] | "[\(.name)]"'
)"
fi
# Remove non-allowed dependencies
mv $tmp/$crate-$version/Cargo.toml $tmp/$crate-$version/Cargo.toml.orig2
cat $tmp/$crate-$version/Cargo.toml.orig2 \
| rq --input-toml --output-json \
| jq '
def filter_allowed:
select(.value.package // .key | inside(env.allowed))
;
. as $root
| (.features | keys | map("[\(.)]") | join(" ")) as $features
| .features = (
(.features // {})
| map_values([ .[] | select(
(split("/")[0] | "[\(.)]" | inside($features))
or
(split("/")[0] | "[\(.)]" | inside(env.allowed))
) ])
)
| .features = (
(.features // {}) + (
(.dependencies // {})
| with_entries(
select(.value.package // .key | inside(env.allowed) | not)
| .value = []
)
)
)
| .dependencies = ((.dependencies // {}) | with_entries(filter_allowed))
| .["build-dependencies"] = (
(.["build-dependencies"] // {}) | with_entries(filter_allowed)
)
| .target = (
(.target // {})
| map_values(
.dependencies = (
(.dependencies // {}) | with_entries(filter_allowed)
)
| .["build-dependencies"] = (
(.["build-dependencies"] // {}) | with_entries(filter_allowed)
)
)
)
' \
| rq --input-json --output-toml \
> $tmp/$crate-$version/Cargo.toml
rm -f $tmp/$crate-$version/Cargo.lock
debug cargo update --manifest-path $tmp/$crate-$version/Cargo.toml
uniq=${crate//-/_}_${version//[^0-9]/_}
# Get the dependency tree and recursively rewrite them to get the hashes needed
# to update this crate
while IFS='|' read -r subcrate alias subversion
do
if ! test $crate-$version = $subcrate-$subversion; then
lhash="$(./publish $subcrate $subversion)"
export ${uniq}_hash_${alias//-/_}=$lhash
fi
done < <(set -e; cargo metadata \
--all-features \
--format-version 1 \
--manifest-path $tmp/$crate-$version/Cargo.toml \
| jq -r '
. as $root
| (.packages[] | select(.id == $root.resolve.root)) as $pkg
| .resolve.nodes[]
| select(.id == $root.resolve.root)
| .deps[]
| .pkg as $depid
| ($root.packages[] | select(.id == $depid)) as $subpkg
| ($pkg.dependencies[] | select(.name == $subpkg.name)) as $dep
| {
name: $subpkg.name,
rename: ($dep.rename // $dep.name),
version: $subpkg.version
}
| "\(.name)|\(.rename)|\(.version)"
')
# Add registry entries to all dependencies
mv $tmp/$crate-$version/Cargo.toml $tmp/$crate-$version/Cargo.toml.orig3
cat $tmp/$crate-$version/Cargo.toml.orig3 \
| rq --input-toml --output-json \
| env uniq=$uniq jq '
def add_registry:
($ENV["\(env.uniq)_hash_\(.key | gsub("-"; "_"))"]) as $hash
| .value["registry-index"] = "\(env.IPFS_GATEWAY)/ipfs/\($hash)"
;
.dependencies = ((.dependencies // {}) | with_entries(add_registry))
| .["build-dependencies"] = (
(.["build-dependencies"] // {}) | with_entries(add_registry)
)
| .target = (
(.target // {})
| map_values(
.dependencies = (
(.dependencies // {}) | with_entries(add_registry)
)
| .["build-dependencies"] = (
(.["build-dependencies"] // {}) | with_entries(add_registry)
)
)
)
' \
| rq --input-json --output-toml \
> $tmp/$crate-$version/Cargo.toml
rm -f $tmp/$crate-$version/Cargo.lock
debug cargo update --manifest-path $tmp/$crate-$version/Cargo.toml
# build crate then repackage it for reproducibility
RUSTFLAGS='--cap-lints allow' debug cargo package --no-verify --manifest-path $tmp/$crate-$version/Cargo.toml
mkdir -p $tmp/package
tar xzf $tmp/$crate-$version/target/package/$crate-$version.crate -C $tmp/package
tar czf $tmp/package/$crate-$version.crate --mtime="$(stat --format=%y $tmp/package/$crate-$version/Cargo.toml.orig)" --clamp-mtime --sort=name -C $tmp/package $crate-$version
# Publish crate file to ipfs
export crate_hash=$(sha256sum $tmp/package/$crate-$version.crate | cut -d' ' -f 1)
dl_hash=$(ipfs add --pin=false -wQ $tmp/package/$crate-$version.crate)
# Inject a link from crate back to the dependencies indexes
# Dunno how to programmatically make an empty dir dag node, so here's the hash
# of one
dir=QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn
while IFS='=' read -r subcrate subhash
do
dir=$(ipfs object patch add-link $dir $subcrate $subhash)
done < <(
cat $tmp/$crate-$version/Cargo.toml \
| rq --input-toml --output-json \
| env uniq=$uniq jq -r '
(., ((.target // {}) | .[]))
| ((.dependencies // {}), (.["build-dependencies"] // {}))
| to_entries | .[]
| .key
| "\(.)=\($ENV["\(env.uniq)_hash_\(. | gsub("-"; "_"))"])"
'
)
dl_hash=$(ipfs object patch add-link $dl_hash .deps $dir)
# Generate index
git init $tmp/index >&/dev/null
cat <<END > $tmp/index/config.json
{
"dl": "$IPFS_GATEWAY/ipfs/$dl_hash/{crate}-{version}.crate"
}
END
gen_index() {
cargo metadata \
--all-features \
--format-version 1 \
--manifest-path $tmp/$crate-$version/Cargo.toml \
| jq --compact-output '
. as $root
| .packages[]
| select(.id == $root.resolve.root)
| {
name,
vers: .version,
deps: [
.dependencies[]
| select((.package // .name) | inside(env.allowed))
| {
name: (.rename // .name),
package: (if .rename then .name else null end),
req,
features,
optional,
default_features: .uses_default_features,
target,
kind: (.kind // "normal"),
registry,
}
],
cksum: env.crate_hash,
features,
yanked: false,
links: null
}
'
}
if test ${#crate} -eq 1; then
mkdir -p $tmp/index/1
gen_index > $tmp/index/1/$crate
elif test ${#crate} -eq 2; then
mkdir -p $tmp/index/2
gen_index > $tmp/index/2/$crate
elif test ${#crate} -eq 3; then
mkdir -p $tmp/index/3/${crate:0:1}
gen_index > $tmp/index/3/${crate:0:1}/$crate
else
mkdir -p $tmp/index/${crate:0:2}/${crate:2:2}
gen_index > $tmp/index/${crate:0:2}/${crate:2:2}/$crate
fi
git -C $tmp/index add . >/dev/null
GIT_COMMITTER_DATE="2000-01-01 00:00:00" git -C $tmp/index commit -am "Add $crate-$version" --date="2000-01-01 00:00:00" >&/dev/null
git clone --bare $tmp/index $tmp/index-bare >&/dev/null
git -C $tmp/index-bare remote remove origin >&/dev/null
git -C $tmp/index-bare update-server-info >&/dev/null
hash=$(ipfs add --pin=false -rQ $tmp/index-bare)
rm -rf $tmp
# Inject a link from index back to the crate node
hash=$(ipfs object patch add-link $hash .crates $dl_hash)
printf " %-40s dl => %s\n" "$crate-$version" "$dl_hash" >&2
printf " %-40s index => %s\n" "$crate-$version" "$hash" >&2
echo $dl_hash > $cache.dl
echo $hash > $cache.index
fi
if test $root = $crate-$version
then
ipfs pin add $hash
echo "
$crate-$version published to $hash
To use ensure you have this in your .cargo/config:
[net]
git-fetch-with-cli = true
Then run
cargo install $crate --index $IPFS_GATEWAY/ipfs/$hash
"
else
echo $hash
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment