Skip to content

Instantly share code, notes, and snippets.

@dramforever
Last active November 6, 2024 18:30
Show Gist options
  • Save dramforever/d2ff99318c70f44149db6070a87da5a0 to your computer and use it in GitHub Desktop.
Save dramforever/d2ff99318c70f44149db6070a87da5a0 to your computer and use it in GitHub Desktop.

This is a description of the mirroring script at tuna/tunasync-scripts#49 (and other related things)

Links:

Overview of the mirroring procedure

  • For each channel found from https://nixos.org/channels
    • If the channel is too old, skip. (Last updated before 2018-12-01, there were quite a few channels and even a format change before that.)
    • Follow the channel URL and find the release version from redirected URL (like nixos-19.09.1840.f7d050ed4e3)
    • If the channel is already at this version, skip
    • If there is a .${channel}.update symlink, mark this channel for binary cache update and skip the rest of this loop
    • Create a directory releases/${version} and download all files (including the most important nixexprs.tar.xz) (check hashes) that do not end in .ova and .iso (images go to a separate mirror). Also, replace binary-cache-url with mirror URL, saving the original.
    • Write a releases/${version}/.released-time file containing the time of release in %Y-%m-%d %H:%M format.
    • If all hashes are fine, create a .${channel}.update symlink releases/${version} pointing to and mark this channel for binary cache update.
  • For each channel that needs a binary cache update
    • Find the original binary cache URL (avoiding hard-coding here)
    • Download (only once for each run) nix-cache-info from the binary cache.
    • Download all paths from store-paths.xz using xargs nix copy into the binary cache (default location: store).
    • If successful, move .${channel}.update to ${channel}, possibly replacing the original

Current design decisions

  • The script depends on Nix, Python 3 with pyquery and requests.
    • Nix is used for nix copy, which is probably the official way to mirror. Reimplementing these will require reimplementing at least: downloading and parsing metadata, skipping existing packages, recursively finding dependencies, retrying...
    • pyquery and requests are used to crawl HTML and they were chosen because other scripts on TUNA mirror already use them. (See below for installing Nix by just downloading and copying a few files)
  • The script is hopefully relatively straightforward.
  • The current design of this juggling of 'channel needs updating' concept and symlinks is because we want the binary cache to be updated before the user can see the new channel release.
  • The channels are symlinked rather than redirected because TUNA cannot easily provide dynamically changing redirections.
  • Random data files exist as hidden (dot prefixed) files in the mirror, such as XDG_CACHE_HOME which is set to <mirror_path>.cache
  • https://github.com/NixIPFS/nixipfs-scripts is not used because it predates nix copy so reimplements a lot of it, which is (in the context of Nix 2.0) is unnecessary.

Garbage collection

Currently we do not do any garbage collection for paths referred to from old releases, but do retain the information to do so.

Proposed changes:

By @dramforever

  • Track in a database the latest known time a path is referred to (timestamp for short).
    • Each time a channel has its paths cloned, the path should have its timestamp updated to the release time of the channel, if either the path has not been seen before or the timestamp was older.
  • When freeing old releases is needed, sort store paths by timestamp, and delete oldest store paths.

By @yuchangyuan

https://gist.github.com/dramforever/d2ff99318c70f44149db6070a87da5a0#gistcomment-3136152

Some thoughts about garbage collection:

  1. After nix copy each time, run command xzcat release/${version}/store-paths.xz | xargs nix path-info -r LOCAL_STORE_URL | sort | uniq (or other equivalent command) to generate a list of full paths, and save the list as release/${version}/full-paths.txt.
  2. For each time GC, target a release/${version} instead. We can compare full-paths.txt in this release/${version} with this file in other release/${version}, find out which store path is referenced only by the release/${verson} need to delete.
  3. For each store path to delete, first delete corresponding nar.xz file, then delete narinfo file, when all nar.xz and narinfo files are deleted, we can safely delete release/${version}

https://gist.github.com/dramforever/d2ff99318c70f44149db6070a87da5a0#gistcomment-3136166

  1. For each store path to delete, first delete corresponding nar.xz file, then delete narinfo file, when all nar.xz and narinfo files are deleted, we can safely delete releases/${version}

This should probably reversed. We should first delete releases/${version} and then delete the binary cache files so that all available releases have binary cache files. Also do we need to delete releases/${version}? Maybe we can ask users to always keep https://cache.nixos.org as a backup cache for files we deleted? This way users can pin nixpkgs to a mirrored version.

Dockerfile

(Should work for any alpine, just replace ustcmirror/base:alpine)

FROM ustcmirror/base:alpine as fetcher

RUN wget https://nixos.org/releases/nix/nix-2.3.2/nix-2.3.2-x86_64-linux.tar.xz -O /tmp/nix.tar.xz && \
    mkdir /tmp/nix.unpack && \
    tar xpf /tmp/nix.tar.xz -C /tmp/nix.unpack && \
    mkdir /nix && \
    cp -dpr /tmp/nix.unpack/*/store /nix/store

FROM ustcmirror/base:alpine

COPY --from=fetcher /nix /nix
RUN ln -s /nix/store/*-nix-*/bin/* /usr/local/bin

# Required for Nix
RUN apk add ca-certificates

RUN apk add python3 py3-requests py3-pip py3-lxml
RUN pip3 install pyquery

Download size information (@yuchangyuan)

Found from https://channels.nix.gsc.io/nixos-19.09/history-url.

Show
Timestamp Version Paths changed Delta NarSize / GiB Delta FileSize / GiB
2019-10-09 06:45 19.09.711.25757b66e18 44149 310.84 78.82
2019-10-09 10:50 19.09.714.2a5bfda3f43 1393 18.85 4.19
2019-10-09 16:35 19.09.716.88bbb3c8096 265 0.40 0.09
2019-10-10 06:55 19.09.735.8d0dc8d737c 2214 26.68 8.06
2019-10-10 12:05 19.09.736.9bbad4c6254 189 0.09 0.01
2019-10-11 01:30 19.09.741.dbad7c7d59f 195 0.10 0.01
2019-10-13 01:22 19.09.766.222004e52e8 1910 34.93 12.00
2019-10-13 07:55 19.09.789.7952807791d 1386 11.48 4.15
2019-10-14 19:05 19.09.794.28d2548a03f 43168 303.28 75.29
2019-10-15 04:50 19.09.809.5000b1478a1 793 8.32 2.16
2019-10-16 06:05 19.09.840.8bf142e001b 1148 8.14 3.63
2019-10-21 19:35 19.09.891.80b42e630b2 25399 231.13 60.18
2019-10-22 23:35 19.09.907.f6dac808387 43095 302.80 75.16
2019-10-26 10:13 19.09.941.27a5ddcf747 391 2.05 0.44
2019-10-28 15:35 19.09.976.c75de8bc12c 16923 206.30 61.45
2019-11-01 09:50 19.09.1019.c5aabb0d603 5784 98.47 35.20
2019-11-07 13:45 19.09.1098.821c7ed030b 42962 301.78 74.93
2019-11-08 07:25 19.09.1125.d628521d0b7 1089 8.19 2.71
2019-11-08 22:50 19.09.1134.d9a83d34c8d 695 9.55 3.26
2019-11-09 02:50 19.09.1149.107e2b7b29f 249 0.77 0.22
2019-11-09 14:35 19.09.1155.bae4d7daa01 192 0.09 0.01
2019-11-10 08:05 19.09.1160.a22b0189002 341 2.38 0.75
2019-11-10 19:15 19.09.1172.2d896998dc9 30254 265.15 66.64
2019-11-12 06:50 19.09.1197.d493b97b265 781 5.41 1.90
2019-11-12 12:50 19.09.1208.ef8c34c4721 189 0.09 0.01
2019-11-13 12:55 19.09.1221.e6a37ef446f 753 5.05 1.55
2019-11-13 13:55 19.09.1223.cb2cdab7136 198 0.44 0.07
2019-11-15 12:50 19.09.1232.133d836dafa 238 0.94 0.24
2019-11-15 13:45 19.09.1241.259a67ca221 244 1.59 0.30
2019-11-15 16:45 19.09.1247.851d5bdfb04 193 0.21 0.04
2019-11-16 05:20 19.09.1254.9104be2ee08 245 0.35 0.06
2019-11-16 18:05 19.09.1258.07e66484e67 215 0.20 0.03
2019-11-19 17:55 19.09.1292.e1843646b04 1078 14.10 5.00
2019-12-09 12:37 19.09.1529.808d3c6d123 14951 201.92 60.46
2019-12-09 15:40 19.09.1548.3a1861fcabc 715 3.84 1.51
2019-12-11 01:15 19.09.1549.45ea6092203 191 0.09 0.01
2019-12-14 12:15 19.09.1584.7351aa52acd 41502 296.12 73.88
2019-12-14 20:35 19.09.1589.57b7b019812 192 0.09 0.01
2019-12-15 19:50 19.09.1590.d85e435b7bd 191 0.09 0.01
2019-12-17 03:20 19.09.1594.fbe321e6669 226 0.62 0.18
2019-12-17 23:00 19.09.1618.c2ef0cee28a 204 1.00 0.22
2019-12-18 00:00 19.09.1619.c337a7423bc 284 0.71 0.22
2019-12-18 02:05 19.09.1620.d40f024a3ba 190 0.09 0.01
2019-12-18 08:20 19.09.1625.0dc46b0e1c8 210 0.62 0.12
2019-12-19 00:50 19.09.1629.ce54d9601ea 584 3.37 1.37
2019-12-19 15:45 19.09.1638.6655a13a56f 288 0.77 0.22
2019-12-19 22:40 19.09.1647.2e73f72c87e 202 0.33 0.09
2019-12-20 15:35 19.09.1654.dd26550fda5 240 0.64 0.24
2019-12-21 03:35 19.09.1662.8e4c9d15456 201 0.67 0.20
2019-12-21 18:40 19.09.1664.968381812b4 192 0.20 0.03
2019-12-22 05:15 19.09.1670.36aa728f2cd 409 1.04 0.22
2019-12-22 19:35 19.09.1673.9bcf1148144 191 0.09 0.01
2019-12-23 21:15 19.09.1682.bfdae0860e4 714 4.10 1.58
2019-12-24 18:10 19.09.1685.e9ef090eb54 190 0.09 0.01
2019-12-26 08:05 19.09.1686.69ed29f5f41 240 0.29 0.04
2019-12-29 00:30 19.09.1687.c5d5561f772 196 0.40 0.10
2019-12-29 08:05 19.09.1690.0d9055a2ac2 189 0.09 0.01
2019-12-30 03:40 19.09.1693.eab4ee0c27c 191 0.14 0.02
2020-01-03 03:40 19.09.1748.ad1e1af5ad3 11363 165.70 53.65
2020-01-04 10:10 19.09.1764.2d9454702e5 356 1.21 0.32
2020-01-04 22:40 19.09.1772.54c9e1f53a7 588 4.89 1.03
2020-01-05 08:35 19.09.1774.a3070689aef 191 0.09 0.01
2020-01-06 01:40 19.09.1776.b926503738c 190 0.18 0.02
2020-01-06 18:55 19.09.1778.db3e8325a9b 528 3.05 1.26
2020-01-07 07:55 19.09.1781.d245ff1bb9b 200 0.09 0.01
2020-01-07 15:50 19.09.1784.fd4ccdbe3a6 200 0.16 0.05
2020-01-08 20:15 19.09.1791.ac218438bdb 1802 15.89 4.96
2020-01-09 04:40 19.09.1803.db5273ce2ab 341 0.76 0.21
2020-01-09 09:50 19.09.1806.b047b7315d8 192 0.13 0.02
2020-01-10 04:30 19.09.1815.caad1a78c47 200 0.42 0.10
2020-01-11 11:35 19.09.1821.9f453eb97ff 586 3.30 1.35
2020-01-12 10:05 19.09.1840.f7d050ed4e3 948 35.08 16.79
2020-01-13 12:15 19.09.1850.5dc4d071ffe 686 4.33 1.57
2020-01-14 03:00 19.09.1861.eb65d1dae62 701 4.01 1.59
@yuchangyuan
Copy link

Some thoughts about garbage collection:

  1. After nix copy each time, run command xzcat release/${version}/store-paths.xz | xargs nix path-info -r LOCAL_STORE_URL | sort | uniq (or other equivalent command) to generate a list of full paths, and save the list as release/${version}/full-paths.txt.
  2. For each time GC, target a release/${version} instead. We can compare full-paths.txt in this release/${version} with this file in other release/${version}, find out which store path is referenced only by the release/${verson} need to delete.
  3. For each store path to delete, first delete corresponding nar.xz file, then delete narinfo file, when all nar.xz and narinfo files are deleted, we can safely delete release/${version}

@dramforever
Copy link
Author

  1. For each store path to delete, first delete corresponding nar.xz file, then delete narinfo file, when all nar.xz and narinfo files are deleted, we can safely delete releases/${version}

This should probably reversed. We should first delete releases/${version} and then delete the binary cache files so that all available releases have binary cache files. Also do we need to delete releases/${version}? Maybe we can ask users to always keep https://cache.nixos.org as a backup cache for files we deleted? This way users can pin nixpkgs to a mirrored version.

@yuchangyuan
Copy link

  1. For each store path to delete, first delete corresponding nar.xz file, then delete narinfo file, when all nar.xz and narinfo files are deleted, we can safely delete releases/${version}

This should probably reversed. We should first delete releases/${version} and then delete the binary cache files so that all available releases have binary cache files. Also do we need to delete releases/${version}? Maybe we can ask users to always keep https://cache.nixos.org as a backup cache for files we deleted? This way users can pin nixpkgs to a mirrored version.

If we only serve latest version of channel, then we can delete old one at any time, except full-paths.txt which is need for GC. So what I mean is actually delete files in binary cache before delete full-paths.txt. But if we need serve both old and new channel, we should first delete old channel data except full-paths.txt, then binary cache data, and finally full-paths.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment