Skip to content

Instantly share code, notes, and snippets.

@z0w0
Last active February 9, 2017 10:27
Show Gist options
  • Save z0w0/4481099 to your computer and use it in GitHub Desktop.
Save z0w0/4481099 to your computer and use it in GitHub Desktop.
Haul, a purely functional package manager for Rust

Haul

Features

  • Purely functional
    • Different versions of packages can coexist
    • Each package must have a UUID, so packages with conflicting names can coexist
  • Build logic
    • Each package's metadata and build logic is written in a central Rust source file, package.rs
    • Dependencies are declared in the package file and will be fetched and installed before
    • Packages are built in a separate directory before installing into the package store to ensure that broken packages are not installed
    • Configuration flags can be passed to the package file via the CLI allowing the developer to configure for special opt-in cases
    • Multiple crates (libraries and binaries) to build can be defined in the package file
    • Package files can define flags and configuration options to be passed to crates being compiled
  • Supports symlinking binaries from one version of a package at a time into a directory, allowing you to add that directory to your path

Introduction

Haul is a concept for a revamp of Cargo (with the final product still being named Cargo, Haul is a "codename" of sorts). Whether or not Mozilla folk want it is another story.

Haul is a purely functional package manager and build system for the Rust programming language inspired by Nix and Leiningen. Like Leiningen, each package (with a package consisting of single or multiple libraries and binaries, a.k.a. crates) has its build logic written in the host language itself, Rust. The simplest Haul package file (named package.rs in the root directory) is such:

#[pkg(uuid = "ad8e8d02-a537-418b-b1f9-6b3d8380e726",
      name = "haul",
      vers = "0.5.2")];

#[pkg_crate("src/haul.rc")];
#[pkg_dep("git://github.com/rapter-jesus/semver", target("0.1.0"))];

This simple configuration figures out how to build and install the package (and dependencies) from the declarative attributes. If you want to do custom build logic (such as probing out to the system for configuration), you can add a build listener:

#[pkg(uuid = "ad8e8d02-a537-418b-b1f9-6b3d8380e726",
      name = "haul",
      vers = "0.5.2")];

#[pkg_dep("git://github.com/rapter-jesus/semver", target("0.1.0"))];

#[pkg_build]
fn build() {
    // Every package.rs automatically has `extern mod rustpkg;` injected
    // at the top of the file, much like how core is injected
    let platform = if os::is_toaster() { ~"toaster" } else { ~"robot" };
    let crate = rustpkg::Crate(~"src/haul.rc").cfg(~"platform=" + platform);

    rustpkg::build(~[crate]);
}

The frontend of the build API follows a FP style (to fit in with Haul's label) and uses core::workcache to only recompile code when it has changed. Generally, it is frowned upon to use custom build logic unless you really, really need it. One use case is to run configuration and Makefiles for native library dependencies, to which the build API provides handy wrappers (around some build systems, mostly automake) which also use workcache to ensure they are only ran when they need to be.

#[pkg(uuid = "ea9ae194-eb20-4027-ab77-7835962094b6",
      name = "cairo",
      vers = "1.3.3")];

#[pkg_build]
fn build() {
    use rustpkg::util;

    let cairo_dir = os::getcwd().push_many(~[~"src", ~"cairo"]);
    let crate = rustpkg::Crate(~"src/cairo.rc");

    util::configure(cairo_dir, ~[]); // run configure <args> in src/cairo (only if configure has changed or hasn't been run yet)
    util::make(cairo_dir, ~[]); // run make -C <args> src/cairo (will always run, relies on the makefile to cache itself)
    rustpkg::build(~[crate]);
}

Haul is described as a purely functional package manager. It installs packages (a collection of binaries and libraries) to a unique directory based on the package UUID, name and version (<haul-dir>/store/<name>-<hash>-<version>), allowing packages to coexist. This is very analagous to the way Rust's libraries work by default. When a Rust library is built, it has its name, version and a cryptographic hash tagged into its output filename. This allows multiple versions of the same library to coexist and be linked in.

Haul allows one package to consist of multiple libraries and the library names can be called anything (so they could conflict with other packages), which means Rust's default system doesn't work out in some cases. So Haul allocates a unique directory for each package where its libraries and binaries are installed. When you specify a dependency for a package to be compiled, Haul automatically adds the link flag to search for libraries in the package's library directory (<haul-dir>/store/<name>-<hash>-<version>/lib). Of course, binaries can not be handled as elegantly.

When you install a package with binaries, it will install it to <haul-dir>/store/<name>-<hash>-<version>/bin. This doesn't allow you to easily run the binary unless you add all specific binary store directories that you want to use to your path, which is simply impractical. Instead of resorting to this nastiness, Haul provides "using" functionality, which will symlink a package's binaries into <haul-dir>/bin which can then be added to the path. This is of course not purely functional and only one version of the package can be used at a time, but it is the price to pay for usability.

There is no central repository. All packages are installed from URLs where HTTP, FTP and Git are supported in a fashion similar to Go.

Usage

Installing

You can install a package with haul in [options] <url> or from the current working directory using only haul in.

Examples

haul in # from the cwd
haul in git://github.com/raptor-jesus/regex
haul in git://github.com/raptor-jesus/regex -t v3.0.1
haul in http://raptor-jesus.me/regex-0.1.0.tar.gz
haul in --cfg waffles=1

Options

  • -c, --cfg - pass a cfg flag to the package.rs file
  • -u, --use - use the package's binaries (see introduction) after installation, asking for confirmation on conflictions
  • -t, --target - if installing via Git, it will checkout this branch/tag before installing as their is no central repository, it is standard to tag the Git master for each release so that users can download certain versions

Uninstalling

You can uninstall a package using haul out <name>[@<version>]. This will remove all binaries and libraries installed into the store for a specific version if the package is not dependended on by another package. If version is omitted, all versions of that package that are removed, except ones which are depended on by other packages. The user is asked to confirm if the package is currently has it's binaries symlinked / used (see using).

Using

You can symlink a specific package's binaries to <haul-dir>/bin (see introduction) using haul use <name>@<version>. If version is omitted, it will use the latest installed version of that package.

haul unuse <name> will unuse any used package's binaries (removes symlinks of binaries from that package placed in <haul-dir>/bin).

Examples

haul in [email protected] # install an older version of the machine package (provides machine)
haul in -u machine # install the latest (v0.1.3) machine package and use 
machine -v
v0.1.3
haul use [email protected]
machine -v
v0.1.2
haul unuse machine
machine -v
< no such file >

Building

You can build a Haul package from the current directory using haul build [options]. haul clean will clean the package's build directory (i.e. it will all need to be rebuilt). It will be built into <haul-dir>/build/<name>-<hash>-<version>.

Examples

haul build --cfg platform=toaster
haul clean

Options

  • -c, --cfg - pass a cfg flag to the package.rs file

Testing

If you want to run all unit tests in all the source files across a package (i.e. pass --test to all libraries and binaries when building) use haul [options] test. Haul will build the bootstrapped test executables into <haul-dir>/test/<name>-<hash>-<version> and then run them. All output of the tests will redirected to stdout.

Options

  • -c, --cfg - pass a cfg flag to the package.rs file

F.A.Q.

Q. Rust has a meta language built in, why not implement the build logic in that and directly extract it from the Rust code?

A. Rust's meta macro language is bloody awesome, but it's not powerful enough for implementing a build process and in my opinion any attempt in that direction will not turn out as elegant as writing it in straight Rust. I'd love to stand corrected, though.

Q. Why did you label this has purely functional if it doesn't strictly follow a purely functional system?

A. Because I'm a dirty rotten liar.

Q. Rust already has a package manager, Cargo. Why are you reinventing the wheel?

A. As of the creation of this project, Cargo wasn't polished enough for general usage and didn't really work well. Also, this is mainly an experiment for fun and might not get anywhere.

@tav
Copy link

tav commented Jan 12, 2013

Hey Zack,

Thanks for this. We definitely need to improve the build/packaging story in Rust. I hope you don't mind if I offer up a constructive comparison with the go tool for the Go language:

  • The tool is called just go — easy to remember.
  • The sub-commands are similarly easy to remember, e..g
$ go build
$ go clean
$ go install
$ go test
  • Convention over configuration.
  • You make a package available by just throwing it up on a repository somewhere, e.g.
github.com/agl/panda
bitbucket.org/tebeka/strftime
launchpad.net/goamz/aws
code.google.com/p/cascadia
  • There is no Makefile, package.json, project.clj, setup.py or other crap. All metadata is implicit.
  • You use a package by just referring to it in an import statement, e.g.
import (
    "bitbucket.org/tebeka/strftime"
    "github.com/agl/panda"
    "time"
)

func main() {
    strftime.Format("%Y/%m/%d", time.Now())
    ...
}
  • When building, the go tool automatically figures out, downloads and builds all dependencies by just parsing the source files for import statements. Simples!
  • There are no protocols in the package name, i.e. no http:// or git://. The package names are actually just strings which map to local directories on the $GOPATH. The build mechanism just happens to be clever enough to download it over the internet when building/installing.
  • This also makes it super easy whilst developing. I can refer to and import my unfinished github.com/tav/html5 package whilst developing the github.com/tav/html-sanitiser tool by just putting them as subdirectories of my local $GOPATH/github.com/tav directory. And once I'm happy to release them to the world, I can just push to GitHub without having to change a single line of code or config.
  • Want to cross-compile for a different platform and architecture? No need to repeatedly pass in --cfg params. Just set GOARCH and GOOS, e.g.
export GOOS=freebsd
export GOARCH=386

# Do your thang, e.g.
#     go build ...
// +build !windows
  • Want to install a binary? Just go get it, e.g. go get github.com/nsf/gocode. Compiled binaries are installed to the $GOBIN directory. No need to fudge around with symlinks. Binaries are simply over-written by newer versions — which, arguably, is what most users want.
  • Want documentation for a tool or library, just run go doc on the path. Or just append the path to a third-party service, e.g. http://godoc.org/github.com/agl/panda

The success of Go — in that there are thousands of packages available for such a young language — is very much due to the quality of this tooling around it. It makes it effortless for developers to both experiment with the language as well as publish and use code.

Do you see any reason why we can't simply mimic this wholesale for a rust tool? My thoughts from a previous discussion with @graydon was to simply leverage extern mod statements in a similar manner to Go's import paths, e.g.

extern mod spdy("github.com/tav/spdy");

The rust build logic can then parse source files for these to figure out dependencies. The main addition which seemed worthwhile was to add support for specific versions which could map to branches/tags in version control, e.g.

extern mod spdy("github.com/tav/spdy#0.3");

To this I would also now add your idea of a build listener, i.e.

#[pkg_build]
fn build() {
    ...
}

Whilst I see no need for that in pure-Rust code, it makes sense for code depending on external C libraries, etc. Anyway, that's enough rambling from me for a Saturday morning. I hope this has been useful in some way. What do you think?


All the best, tav

@tav
Copy link

tav commented Jan 12, 2013

And, oh, @graydon also came up with a cool idea for a rustpkg.org redirection service which we could all use. It would allow for packages to be moved from one service to another without having to update any source files, e.g. code could refer to:

extern mod spdy("rustpkg.org/tav/spdy");

So I could initially have it pointing to github.com/tav/spdy and should I get tired of maintaining it, I could update it to point to github.com/stooge/spdy 😉

To add to this, it might also be useful to have a rust map command which updates a ~/.rustmap.yaml file, e.g.

github.com/tav/spdy: github.com/stooge/spdy
github.com/tav/html/*: internal-server.com/html/*

The rust tool would then rewrite references when building. This would be especially useful in production environments when you want to perhaps refer to local mirrors instead of public sources which may change underneath you or not be accessible due to some outage.

@steveklabnik
Copy link

URIs are already UUIDs, seems fine to me.

@graydon
Copy link

graydon commented Jan 15, 2013

A few points (re-transcribed because github ate my comment):

  • Can we move this to a real bug? Either #2238 or #2237 or something?
  • Yes, we'll call the front-end tool rust; see #2238
  • No, we're not replacing --cfg with env vars and build flags hidden in comments. Orthogonal issue anyway.
  • If we drive dependencies off extern mod, it carries implications. I'm (personally) ok with these implications. z0w0 wasn't, which is why he pitched this the way he did. We need to decide this fact. The main implication is that you can only have one crate at a given URL; a package that contains a library and two binaries (say) cannot be denoted by a single URL. It's 3 URLs. If we're going to mandate that, we need to be sure it suits all our needs.
  • Version ranges are also required in dependency specification. There's surely a way to mangle them into URLs.
  • Removing the protocol from the URL to permit mapping to filesystem paths is good and we should do that.
  • Blindly overwriting existing versions is not ok. Each version number has to get turned into a dir prefix. Simultaneous installation of multiple versions of a package is necessary.
  • An aside: the tone of the comments distract from digesting them; editorializing about the inadequacy of our existing tools requires parsing around the insults to find the content.

In general I'm in broad agreement that "we can do simpler and probably should". I just want to make sure we don't wind up cutting off our nose to spite our face. Let's work out the minimum required to meet our needs, and do it.

@z0w0
Copy link
Author

z0w0 commented Jan 16, 2013

Sorry for the delay of response, I didn't get notified. Weird.

@tav although I agree that bare URLs are really elegant, please see what @graydon wrote, because he states exactly why Haul works the way it does (the original Haul looked different but he made really good points and I edited it, so this specification is almost co-inspired by him). Coexisting packages is one of the most important features I think a Rust package manager can have. Implicitly overwriting packages isn't good enough.

There are reasons why there needs to be a unique ID for each package explicitly marked in metadata. Firstly, like @graydon mentions, forcing a package to be the exact same as a crate (i.e. one crate per package, hence the UUID and name are inferred from the crate's link metadata) is way too restrictive. You should really be able to install both libraries and binaries with one package. With that the package must have some unique ID (and name) in order to allow for packages with the same name to be installed from multiple sources. Having the user be able to "prefer" binaries in a package is crucial to the purely functional concept and hence is a must, so there must be some sort of unique ID to fall back on for the user interface if packages with the same name are installed. This also makes the removing interface easier (which I believe is also a must, to have a package manager without removing packages seems dirty to me). Asking the user to remember what URL they downloaded the package from in order to remove or prefer it isn't very friendly, in my opinion.

As mentioned, packages can have multiple crates so declaring dependencies via the extern mod call is kind of ambiguous. Please note that I added this same functionality to Cargo originally so this isn't coming from someone who hates the idea. I am also of the opinion that Haul's way of doing custom build logic is an improvement. Servo is going to need imperative custom build logic and doing it the way you suggest with build logic allowed in all crates would mean parsing and compiling just the build logic in every crate and then parsing and building it again. Not a fan. At the end of the day, the way Go works is not that different from how this works. There's always room for experimentation, there'll be plenty of time to muck around with it and get it perfect.

@graydon I think this idea is incredibly simple. What do you think could be simpler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment