This is very important for reproducible builds that is a basics for secure delivery.
- Reproducible builds: Archives has a hint to use the
-X
- https://fekir.info/post/reproducible-zip-archives/ more detailed article how to make deterministic ZIP in Python and with a shell command
- https://www.npmjs.com/package/deterministic-zip a NodeJS package to reproducibly zip
- https://stackoverflow.com/questions/62524167/zip-non-deterministic-result-in-linux has some hints
- https://wiki.debian.org/ReproducibleBuilds/TimestampsInZip Debian detects non-deterministic zip are used and reports as an error. Some workaround is proposed.
And archive preserves a user and group (usually only their ids uid/gid) and time of last modification mtime
.
The time is almost always not important so you can set standard static reproducible date 1 Feb 1080.
The date is used in many tools like Maven, Gradle etc.
Or instead you can use SOURCE_DATE_EPOCH
env variable and use a date from git log.
Owner uid/gid can be just zeroed.
reproducible_tar() {
src_folder=$1
tar \
--remove-files \
--sort=name \
--mtime='UTC 1980-02-01' \
--owner=0 --group=0 --numeric-owner \
--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
-z \
-cf $src_folder.tar.gz $src_folder/
TZ=UTC touch -a -m -t 198002010000.00 $src_folder.tar.gz
}
reproducible_tar test_targz
Here
--remove-files
will remove the source folder once it was successfully compressed.--sort=name
will sort files so their order in tar will be always the same--mtime='UTC 1980-02-01'
sets a modification time to a standard static reproducible date 1 Feb 1080 in UTC.--owner=0 --group=0 --numeric-owner
remove owner uid and gid.--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime
remove headers with access timeatime
andctime
.-z
to compress the tar to gzip format. You may use--use-compress-program 'gzip -9'
to set thegzip
options like max compression.
The resulted archive also is better to arrange with mtime. The touch -a -m
sets the access and modification times.
Now to uncompress use:
pushd ./folder_with_archive || exit 1
tar -xf $src_folder.tar.gz
rm -f $src_folder.tar.gz
The tar doesn't have an option to delete the archive during decompression.
In gzip this is a default behaviour and to keep an archive you should add the -k
or --keep
option.
Not sure why the tar doesn't work like that.
So the last command is manual remove of the archive with rm
.
You may have a big archive and removing parts while extracting may safe sometimes.
For example on a router with a small NAND flash. Unarchiving may fail so anyway this would be dangerous.
Deletion may happen only after a successful uncompression.
reproducible_zip() {
src_folder=$1
TZ=UTC find . -exec touch --no-dereference -a -m -t 198002010000.00 {} +
TZ=UTC zip -q --move --recurse-paths --symlinks -X $src_folder.zip $src_folder
TZ=UTC touch -a -m -t 198002010000.00 $src_folder.zip
}
reproducible_zip test_zip
The ZIP doesn't an option to set the mtime
so we have to change the mtime
of all files and symlinks in the folder and only then zip it.
Zip command options:
-q
is quite--move
or-m
will remove files once zip is complete--recurse-paths
to compress all subfolders--symlinks
add symlinks too, otherwise they'll ignored-X
or--no-extra
(not supported for some reason) is used to removeuid
/gid
fields.
The resulted archive also is better to arrange with mtime
. The touch -a -m
sets the access and modification times.
To unarchive zip you also have to specify UTC
timezone otherwise it will set files time in local timezone.
If you have symlinks the additionally also you'll have to touch them to set mtime
.
pushd ./folder_with_archive || exit 1
TZ=UTC unzip -q $src_folder.zip
# unzip doesn't restore mtime of symlinks (bug?), so update it manually
TZ=UTC find . -exec touch --no-dereference -a -m -t 198002010000.00 {} +
rm -f $src_folder.zip
The unzip doesn't have an option to delete the archive during decompression.
So the last command is manual remove of the archive with rm
.