Skip to content

Instantly share code, notes, and snippets.

@nrdxp
Created October 24, 2024 15:28
Show Gist options
  • Save nrdxp/7132e0e22c452e13667a76a4f9e6f2f2 to your computer and use it in GitHub Desktop.
Save nrdxp/7132e0e22c452e13667a76a4f9e6f2f2 to your computer and use it in GitHub Desktop.

Well my approach is a fundamental shift in how we would run a Nix evaluation, based on 10 years of being annoyed at how costly evaluation can be sometimes, and the experience I gained in that time from Nix and software engineering in general.

So far includes two components, a novel Nix module system that actually provides sensible boundaries for Nix code for not much cost (it is actually insanely fast compared to the NixOS module system). This piece makes the Nix code bounded and trvially statically analyzeable. It perpusefully bucks the trend of depending on nixpkgs and being super complex and heavy. The core of it took me two hours to write and I have kept it purposefully small since. Besides performance and predictability, another nice feature is trivial tracing (unlike the monsterous traces produced by the nixos module system). There is still some work to do to integrate it more cleanly with my CLI, and I am almost at that stage.

As for the Rust based cli, there are a few core planned features but right now all I really have (although its pretty solid at this point) is a publish subcommand. This CLI is not meant to be just another Nix wrapper, but instead, a higher level frontend to a Nix like build service (I envision a Guix integration at some point in the future as well). Publishing is somewhat abstracted to allow different storage backends but the inaugural implementation, and probably source of truth for all other backends, is an implementation in pure git.

I use gitoxide to do some non-standard git things in a very efficient and secure manner. Essentially I detect my format (based on a unique file-extension), and create a detached (orphaned) history containing just that "atom" as I call it. Similar to a crate, it contains a manifest and some source files (in a self-contained directory). Since some usecases in Nix may be aided by shared static config, the source dir is optional.

These are then versioned (a semver is required in the manifest) and stored in unique gitrefs under a custom prefix. The contents are simply new git trees (references to already existing blobs). There is nowhere in the code that writes new blobs anywhere, so this essentially ensures that there is no way to "corrupt" the files during publishing, and also makes the format extremely light on gits store, since tree objects are akin to mere references, they are cheap to create, store and crucially, fetch.

I store some additional, non-standard, meta-data in the commit header, such as the commit from which the atom originates, and commit it using a constant timestamp and author information, making the atom commit fully reproducible. Not implemented yet, but I plan to also allow for optionally signing the atom commit with a tag object. This way one can short circuit manual verification if they trust the key, but even so, manually verifying the contents of an atom is trivial.

A ref pointing to the original source commit is also made in the atom's ref prefix, so that as long as an atom exists, so to will the source it came from. Verifying is as trivial as pulling it and checking the objectid of the source tree and manifest blob, if you are ever in doubt. You might be wondering "why bother"? I was trying to think of a system that could be incrementally adopted in a repo as large as nixpkgs, which would also make it more efficient to pull different packages from various points in history (different versions) without having to grab the entire nixpkgs tree n times, which is a persistent, and growing burden, that flakes actually made far worse.

I am close to ready to start working on a full blown version resolver that resolves version constraints for these atom's across repositories, and produces a truly minimal set of dependencies (much unlike our friend the flake), directly from source while also remaining extremely cheap to fetch and totally self-contained in git. The versions themselves are contained in the refspec, so resolving the atoms a repository has does not require fetching any of its contents, which is why I spent a lot of time ensuring I do not break this property when writing the publishing code.

I envision a 3rd critical piece which we have only been brainstorming, and only have a skeleton repo for so far, but it would essentially serve as a much more efficient backend for Nix like evaluators, and serve to decouple evaluation from the caller (our cli binary). This is where my atom id concept (implemented during publishing) comes in to bolster caching. An atom contains a unicode id in the manifest, however, in order to make its identity fully unambiguos and avoid an annoying global namespace (like rust crates suffer from), I also include the concept of a "root" which is used as a key input for a blake3 sum over the atom unicode id.

This "root" key, in the git implementation I just described, is the oldest, parentless commit in the repositories history (the origin of the repository). I decided on this as I was trying to think of a way to unambiguously identify a git repo, without replying on emphermal information like remote names, etc (since those can change but you still have the same code and history underneath).

If the very root of your history changes, you very clearly no longer have quite the same codebase, so to me it seemed like the perfect identifier. This also makes the atom format decentralized, just as git, since you can just publish them to as many remotes as you like. In any case, this ID will be used extensively to track information in this backend we are still designing. We plan to use capt-proto as the exchange format to make it extremely efficient to offload builds and evaluations to multiple instances.

Everything this backend builds or evals will be tracked, by atomid and then also by derivation information which already exists, creating a trivial mapping to the final artifact (if it exists). This is what will (once fully implemented) allow a user to simply call for a package and instantly start to download it, avoiding evaluation entirely (if it was already built).

I arrived at all this not to try to replace or even compete with Nix, but through my observation after all these years that Nix is actually quite good as a low-level build tool. It is a not a high level piece, and we shouldn't try to make it so (flakes), as it is fundamentally working at a different level of abstraction. Users simply don't care about derivations, as useful as they are. Everybody does already know what a version is though, and it is an abstraction we have sort of lost in Nix, pinning everything exactly. This model tries to essentially reintroduce it in a principled manner, while retaining the benefits of Nix and mitgating many of its pains.

So again, this is a backend agnostic system envisioned to provide a proper, user (and developer) level abstraction which makes working with the low-level concept of derivations a breeze, by abstracting the nuisance, creating clean boundaries, reducing reliance on Nix code outright (I envision a plugin system for the cli to generate inputs), and only use it strictly as the DSL it was designed to be, and not the monsterous and growing beast it has become a la the module system, etc.

None of the pieces I have so far are tied to Nix in any way, and that would be an indication of a boundary violation. It should remain abstract enough to be useful for any Nix like tool (and perhaps future usecases I haven't envisioned). In any case, we are going to be going public soon.

If I could give a high-level principle that has motivated this whole thing it would be something like, "nothing is cheaper than static". If you keep as much as possible about the build statically knowable (the manifest, the refs, the atomids in the backend, the evaluation and build history) you don't have to worry about some eval taking an indeterminate amount of time in between you and the thing you want to use, at least not more than once (for the inital build).

Sorry for the novel, but its hard to explain the plan without all the pieces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment