Stargz Study Notes

idea

proposed by bradfitz for acceralating the CI of the Go lang project
x/build: speed up large container start-up times without pre-pulling containers into VMs (CRFS): golang/go#30829
- motivation: Our current situation (building a container, pushing to gcr.io, then automating the creation of a COS-like VM images that has the image pre-pulled) is pretty gross and tedious.
- initial: https://github.com/google/crfs (read-only FUSE filesystem that lets you mount a container image, served directly from a container registry)

crfs

https://github.com/google/crfs
- pull operation from a container registry to read the entire container image from the registry and write the entire container image to the local machine’s disk. It’s pretty silly (and wasteful) that a read operation becomes a write operation.
- Go: For isolation and other reasons, we run all our containers in a single-use fresh VMs. we’ve automated the creation of VM images where our heavy containers are pre-pulled. This is all a silly workaround. It’d be much better if we could just read the bytes over the network
- tar files are unindexed, and gzip streams are not seekable
- Stargz: Seekable tar.gz - make a tar file where each tar entry is its own gzip stream (non-unindexed and non-unseekable)
- traditional: Gzip(TarF(file1) + TarF(file2) + TarF(file3) + TarFooter))
- stargz: Gzip(TarF(file1)) + Gzip(TarF(file2)) + Gzip(TarF(file3_chunk1)) + Gzip(F(file3_chunk2)) + Gzip(F(index of earlier files in magic file), TarFooter) - a few percent larger - it's plenty acceptable
- operation: HTTP Range requests to read just the stargz index out of the end of each of the layers from the registry. index is stored similar to how the ZIP format's TOC is stored, storing a pointer to the index at the very end of the file. index contains the offset of each file's GZIP(TAR(file data)) range. multiple stargz index entries for large files to efficiently read a small amount of data from large files.
- adopted: https://github.com/containerd/stargz-snapshotter (implemented by Kohei Tokunaga as a containerd plugin) and https://github.com/giuseppe/crfs-plugin (by Giuseppe Scrivano, for Podman, implemented as https://github.com/containers/fuse-overlayfs, idea golang/go#30829 (comment) - zstd: starzstd?, non-chunked, fast decompression by zstd)
import "bazil.org/fuse"
TOCEntry: https://github.com/google/crfs/blob/71d77da419c90be7b05d12e59945ac7a8c94a543/stargz/stargz.go#L108-L191
index: stargz.index.json: stores version and list of TOCEntry

DEMO

stargzify: $ go install github.com/google/crfs/stargz/stargzify@latest
usage: https://github.com/google/crfs/blob/71d77da419c90be7b05d12e59945ac7a8c94a543/stargz/stargzify/stargzify.go#L43-L60

$ skopeo copy --override-os linux docker://busybox:latest oci:busybox

$ file busybox/blobs/sha256/f5b7ce95afea5d39690afc4c206ee1bf3e3e956dcc8d1ccd05c6613a39c4e4f8
busybox/blobs/sha256/f5b7ce95afea5d39690afc4c206ee1bf3e3e956dcc8d1ccd05c6613a39c4e4f8: gzip compressed data, original size modulo 2^32 1459200

$ stargzify file:./busybox/blobs/sha256/f5b7ce95afea5d39690afc4c206ee1bf3e3e956dcc8d1ccd05c6613a39c4e4f8 file:output.stargz

$ exiftool output.stargz
ExifTool Version Number         : 12.27
File Name                       : output.stargz
File Permissions                : -rw-r--r--
File Type                       : GZIP
File Type Extension             : gz
MIME Type                       : application/x-gzip

$ mkdir out
$ tar -xf output.stargz -C out
$ cd out/
$ chmod 777 stargz.index.json

$ cat stargz.index.json | jq

if you want to optimize your stargz to get estargz, then use $ ctr-remote optimize command.

slacker

pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.
lazily fetching speeds up the median container development cycle by 20× and deployment cycle by 5x
utilizes modications we make to the Linux kernel in order to improve cache sharing
image pushes become 153x faster and pulls become 72x faster
benchmark: https://github.com/Tintri/hello-bench

teleport

https://github.com/Azure/acr/blob/main/docs/teleport/README.md
azure
client: Orca
Highly Factored Registry Protocol
requesting Azure Premium File mount points for each layer ID (only the content read by the container is pulled across the network, speeding container start time)
SMB mounting each layer as pre-expanded content
NOT FREE!

filegrain

https://www.youtube.com/watch?v=aRXIsT56A08
abandoned

ipcs

https://github.com/hinshun/ipcs
IPFS (P2P CAS)
not-oci compatible
containerd implemantation

containerd

initial discuss: containerd/containerd#2943
stargz-snapshotter: https://github.com/containerd/stargz-snapshotter
gRPC plugin
FUSE mount per image layer
indexed files per image layer
uses overlay storage driver
estargz:
- arbitrary prioritized files
- pre-fetch on demand using HTTP range requests (optimize image with user-specified workload)
- workload-based performance optimization and content verifying
- fetch files/chunks (registry) -> (lazy pull) mounting layers as FUSE (CRI) -> using layers for rootfs (container)
- impl: https://github.com/containerd/stargz-snapshotter/blob/main/estargz/estargz.go
- ctr-remote: https://github.com/containerd/stargz-snapshotter/blob/main/cmd/ctr-remote
- optimizer: https://github.com/containerd/stargz-snapshotter/blob/main/cmd/ctr-remote/commands/optimize.go
- supported by: kaniko, nerdctl, crane, ko
stargz to estargz converter (thanks to @ktock)
- prioritized means that all accessed files during running entrypoint (or user-specified commands) of the image: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/docs/ctr-remote.md
- ctr-remote optimize command runs analyze function: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/cmd/ctr-remote/commands/optimize.go#L198
- first of all the list of prioritized files in the image is created by analyzer.Analyze(): https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/analyzer/analyzer.go#L54
- analyzer.Analyze() function returns a list of prioritized files (a list of recorder.Entry). It's encoded as JSON and stored in containerd content store: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/recorder/recorder.go#L26-L30
- what kind of things we run to “detect” prioritized files: analyzer runs the image in a container and detects all file accesses using fanotify: https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/analyzer/fanotify/fanotify.go
- This funciton returns a digest of the JSON data so the caller can query it from the content store by that digest and decode it to a list of recorder.Entry struct. https://github.com/containerd/stargz-snapshotter/blob/735678f0eec4a6304588e0e358d1aad26465d747/cmd/ctr-remote/commands/optimize.go#L269-L274

refs

https://medium.com/nttlabs/startup-containers-in-lightning-speed-with-lazy-image-distribution-on-containerd-243d94522361
https://docs.google.com/presentation/d/1DJlRV9a445567EyRa265uemWv5zoDQ4o1CK-ZszpFLE/edit#slide=id.gc6f73a04f_0_0 (cvmfs - https://github.com/cvmfs/cvmfs)
https://github.com/cvmfs/cvmfs
https://www.usenix.org/system/files/conference/fast16/fast16-papers-harter.pdf
https://stevelasker.blog/2019/10/29/azure-container-registry-teleportation/
https://www.youtube.com/watch?v=aRXIsT56A08 (FILEgrain by Akihiro Suda)
https://www.youtube.com/watch?v=j4eIgdDkI9I (Speeding Up Analysis Pipelines with Remote Container Images)
https://www.youtube.com/watch?v=r981cUwoD7o (Starting up Containers Super Fast With Lazy Pulling of Images)
https://www.slideshare.net/KoheiTokunaga/fosdem-2021-build-and-run-containers-with-lazy-pulling-adoption-status-of-containerd-stargz-snapshotter-and-estargz
https://github.com/google/crfs
https://stevelasker.blog/2019/10/29/azure-container-registry-teleportation
https://github.com/cvmfs/cvmfs
https://github.com/akihirosuda/filegrain

Dentrax/stargz.index.json.md

Stargz Study Notes