idea
- proposed by bradfitz for acceralating the CI of the Go lang project
- x/build: speed up large container start-up times without pre-pulling containers into VMs (CRFS): golang/go#30829
-
- motivation: Our current situation (building a container, pushing to gcr.io, then automating the creation of a COS-like VM images that has the image pre-pulled) is pretty gross and tedious.
-
- initial: https://github.com/google/crfs (read-only FUSE filesystem that lets you mount a container image, served directly from a container registry)
crfs
- https://github.com/google/crfs
-
- pull operation from a container registry to read the entire container image from the registry and write the entire container image to the local machine’s disk. It’s pretty silly (and wasteful) that a read operation becomes a write operation.
-
- Go: For isolation and other reasons, we run all our containers in a single-use fresh VMs. we’ve automated the creation of VM images where our heavy containers are pre-pulled. This is all a silly workaround. It’d be much better if we could just read the bytes over the network
-
- tar files are unindexed, and gzip streams are not seekable
-
- Stargz: Seekable tar.gz - make a tar file where each tar entry is its own gzip stream (non-unindexed and non-unseekable)
-
- traditional:
Gzip(TarF(file1) + TarF(file2) + TarF(file3) + TarFooter))
- traditional:
-
- stargz:
Gzip(TarF(file1)) + Gzip(TarF(file2)) + Gzip(TarF(file3_chunk1)) + Gzip(F(file3_chunk2)) + Gzip(F(index of earlier files in magic file), TarFooter)
- a few percent larger - it's plenty acceptable
- stargz:
-
- operation: HTTP Range requests to read just the stargz index out of the end of each of the layers from the registry. index is stored similar to how the ZIP format's TOC is stored, storing a pointer to the index at the very end of the file. index contains the offset of each file's
GZIP(TAR(file data))
range. multiple stargz index entries for large files to efficiently read a small amount of data from large files.
- operation: HTTP Range requests to read just the stargz index out of the end of each of the layers from the registry. index is stored similar to how the ZIP format's TOC is stored, storing a pointer to the index at the very end of the file. index contains the offset of each file's
-
- adopted: https://github.com/containerd/stargz-snapshotter (implemented by Kohei Tokunaga as a containerd plugin) and https://github.com/giuseppe/crfs-plugin (by Giuseppe Scrivano, for Podman, implemented as https://github.com/containers/fuse-overlayfs, idea golang/go#30829 (comment) - zstd: starzstd?, non-chunked, fast decompression by zstd)
import "bazil.org/fuse"
TOCEntry
: https://github.com/google/crfs/blob/71d77da419c90be7b05d12e59945ac7a8c94a543/stargz/stargz.go#L108-L191- index:
stargz.index.json
: stores version and list of TOCEntry
DEMO
stargzify
:$ go install github.com/google/crfs/stargz/stargzify@latest
- usage: https://github.com/google/crfs/blob/71d77da419c90be7b05d12e59945ac7a8c94a543/stargz/stargzify/stargzify.go#L43-L60
$ skopeo copy --override-os linux docker://busybox:latest oci:busybox
$ file busybox/blobs/sha256/f5b7ce95afea5d39690afc4c206ee1bf3e3e956dcc8d1ccd05c6613a39c4e4f8
busybox/blobs/sha256/f5b7ce95afea5d39690afc4c206ee1bf3e3e956dcc8d1ccd05c6613a39c4e4f8: gzip compressed data, original size modulo 2^32 1459200
$ stargzify file:./busybox/blobs/sha256/f5b7ce95afea5d39690afc4c206ee1bf3e3e956dcc8d1ccd05c6613a39c4e4f8 file:output.stargz
$ exiftool output.stargz
ExifTool Version Number : 12.27
File Name : output.stargz
File Permissions : -rw-r--r--
File Type : GZIP
File Type Extension : gz
MIME Type : application/x-gzip
$ mkdir out
$ tar -xf output.stargz -C out
$ cd out/
$ chmod 777 stargz.index.json
$ cat stargz.index.json | jq
- if you want to optimize your stargz to get
estargz
, then use$ ctr-remote optimize
command.
slacker
- pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.
- lazily fetching speeds up the median container development cycle by 20× and deployment cycle by 5x
- utilizes modications we make to the Linux kernel in order to improve cache sharing
- image pushes become 153x faster and pulls become 72x faster
- benchmark: https://github.com/Tintri/hello-bench
teleport
- https://github.com/Azure/acr/blob/main/docs/teleport/README.md
- azure
- client: Orca
- Highly Factored Registry Protocol
- requesting Azure Premium File mount points for each layer ID (only the content read by the container is pulled across the network, speeding container start time)
- SMB mounting each layer as pre-expanded content
- NOT FREE!
filegrain
ipcs
- https://github.com/hinshun/ipcs
- IPFS (P2P CAS)
- not-oci compatible
- containerd implemantation
containerd
- initial discuss: containerd/containerd#2943
- stargz-snapshotter: https://github.com/containerd/stargz-snapshotter
- gRPC plugin
- FUSE mount per image layer
- indexed files per image layer
- uses overlay storage driver
- estargz:
-
- arbitrary prioritized files
-
- pre-fetch on demand using HTTP range requests (optimize image with user-specified workload)
-
- workload-based performance optimization and content verifying
-
- fetch files/chunks (registry) -> (lazy pull) mounting layers as FUSE (CRI) -> using layers for rootfs (container)
-
- supported by: kaniko, nerdctl, crane, ko
- stargz to estargz converter (thanks to @ktock)
-
- prioritized means that all accessed files during running entrypoint (or user-specified commands) of the image: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/docs/ctr-remote.md
-
- ctr-remote optimize command runs analyze function: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/cmd/ctr-remote/commands/optimize.go#L198
-
- first of all the list of prioritized files in the image is created by
analyzer.Analyze()
: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/analyzer/analyzer.go#L54
- first of all the list of prioritized files in the image is created by
-
analyzer.Analyze()
function returns a list of prioritized files (a list of recorder.Entry). It's encoded as JSON and stored in containerd content store: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/recorder/recorder.go#L26-L30
-
- what kind of things we run to “detect” prioritized files: analyzer runs the image in a container and detects all file accesses using fanotify: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/analyzer/fanotify/fanotify.go
-
- This funciton returns a digest of the JSON data so the caller can query it from the content store by that digest and decode it to a list of recorder.Entry struct. https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/cmd/ctr-remote/commands/optimize.go#L269-L274
refs
- https://medium.com/nttlabs/startup-containers-in-lightning-speed-with-lazy-image-distribution-on-containerd-243d94522361
- https://docs.google.com/presentation/d/1DJlRV9a445567EyRa265uemWv5zoDQ4o1CK-ZszpFLE/edit#slide=id.gc6f73a04f_0_0 (cvmfs - https://github.com/cvmfs/cvmfs)
- https://github.com/cvmfs/cvmfs
- https://www.usenix.org/system/files/conference/fast16/fast16-papers-harter.pdf
- https://stevelasker.blog/2019/10/29/azure-container-registry-teleportation/
- https://www.youtube.com/watch?v=aRXIsT56A08 (FILEgrain by Akihiro Suda)
- https://www.youtube.com/watch?v=j4eIgdDkI9I (Speeding Up Analysis Pipelines with Remote Container Images)
- https://www.youtube.com/watch?v=r981cUwoD7o (Starting up Containers Super Fast With Lazy Pulling of Images)
- https://www.slideshare.net/KoheiTokunaga/fosdem-2021-build-and-run-containers-with-lazy-pulling-adoption-status-of-containerd-stargz-snapshotter-and-estargz
- https://github.com/google/crfs
- https://stevelasker.blog/2019/10/29/azure-container-registry-teleportation
- https://github.com/cvmfs/cvmfs
- https://github.com/akihirosuda/filegrain