Skip to content

Instantly share code, notes, and snippets.

@nmoinvaz
Last active February 26, 2026 20:28
Show Gist options
  • Select an option

  • Save nmoinvaz/065c2f5e1d0ffcefc3c6d3d515f31efd to your computer and use it in GitHub Desktop.

Select an option

Save nmoinvaz/065c2f5e1d0ffcefc3c6d3d515f31efd to your computer and use it in GitHub Desktop.
zlib-ng vs zlib-rs benchmark comparison on Apple M3 (ARM64)

zlib-ng vs zlib-rs Benchmark Comparison (ARM64, Apple M3)

Machine Specs

  • CPU: Apple M3 (8 cores)
  • RAM: 24 GB
  • OS: Darwin 24.6.0 arm64 (macOS Sequoia)
  • Compiler: Apple clang 17.0.0 (clang-1700.6.3.2)
  • Rust: rustc 1.93.1 (01f6ddf75 2026-02-11)

Versions Tested

  • zlib-ng: 54352daf (develop branch) — "Make extra length/distance bits computation branchless using bit masking"
  • zlib-rs: bb25b662 (main branch, v0.6.2) — "fix compilation errors for 'cargo test --all-features'"

Results

Benchmarks run with 5 repetitions each. Median CPU time values shown below.

Compress (compress())

Uses compress() one-shot API at default compression level (6).

Input Size zlib-ng (ns) zlib-rs (ns) Difference
1 B 1,880 4,649 +147%
16 B 2,179 4,971 +128%
48 B 2,556 5,499 +115%
256 B 3,126 6,166 +97%
1 KB 4,567 8,345 +83%
4 KB 15,107 21,746 +44%
16 KB 52,346 69,908 +34%
64 KB 144,795 196,281 +36%

Deflate Streaming with Checksum (deflateInit2 + deflate, zlib format)

Uses streaming deflate() API with zlib wrapping (includes adler32 checksum). Parameterized by input size and compression level. Uses deflateReset() between iterations to avoid measuring init/teardown.

Input Size Level zlib-ng (ns) zlib-rs (ns) Difference
1 KB 1 1,448 2,205 +52%
1 KB 3 3,895 6,463 +66%
1 KB 6 4,096 6,666 +63%
1 KB 9 5,120 7,997 +56%
16 KB 1 8,328 9,426 +13%
16 KB 3 25,325 30,692 +21%
16 KB 6 52,660 65,257 +24%
16 KB 9 87,626 102,599 +17%
128 KB 1 62,493 74,768 +20%
128 KB 3 128,075 160,128 +25%
128 KB 6 262,047 350,534 +34%
128 KB 9 765,375 889,292 +16%
1 MB 1 536,404 638,547 +19%
1 MB 3 958,295 1,191,189 +24%
1 MB 6 1,980,172 2,642,689 +33%
1 MB 9 6,206,186 7,370,042 +19%

Raw Deflate, No Checksum (deflateInit2 with -MAX_WBITS)

Same as above but using raw deflate (window bits = -15), which skips adler32 checksum computation.

Input Size Level zlib-ng (ns) zlib-rs (ns) Difference
1 KB 1 1,522 2,192 +44%
1 KB 3 3,993 6,345 +59%
1 KB 6 4,127 6,619 +60%
1 KB 9 5,273 7,918 +50%
16 KB 1 7,956 8,933 +12%
16 KB 3 24,192 30,128 +25%
16 KB 6 52,242 65,732 +26%
16 KB 9 89,010 104,638 +18%
128 KB 1 62,799 72,036 +15%
128 KB 3 127,949 156,085 +22%
128 KB 6 260,205 345,498 +33%
128 KB 9 771,221 911,661 +18%
1 MB 1 530,487 612,449 +15%
1 MB 3 958,855 1,171,367 +22%
1 MB 6 1,973,975 2,611,918 +32%
1 MB 9 6,164,965 7,333,845 +19%

Inflate (raw deflate, no checksum)

Uses streaming inflate() API with raw deflate (window bits = -15, no adler32). Data pre-compressed at level 9. Uses inflateReset() between iterations.

Input Size zlib-ng (ns) zlib-rs (ns) Difference
1 B 19.1 26.3 +38%
64 B 135 148 +10%
1 KB 290 377 +30%
16 KB 3,862 4,932 +28%
128 KB 15,087 19,507 +29%
1 MB 106,000 136,069 +28%

Uncompress (uncompress())

Uses uncompress() one-shot API. Data pre-compressed at level 9.

Input Size zlib-ng (ns) zlib-rs (ns) Difference
1 B 45.3 286 +532%
64 B 160 387 +142%
1 KB 343 626 +83%
16 KB 4,322 5,471 +27%
128 KB 18,793 23,836 +27%
1 MB 138,435 168,531 +22%

Key Takeaways

  • Compression (all levels): zlib-ng is consistently faster across all compression levels and sizes.
  • Level 6 (default) shows the widest gap at ~33% for large inputs — this is the most commonly used compression level.
  • Level 9 (best compression) has the smallest gap at ~17-19% for large inputs — the exhaustive match search dominates and both implementations do similar work.
  • Level 1 (fastest) gap is ~15-20% for large inputs — the simpler fast path leaves less room for optimization differences.
  • Checksum overhead is negligible: Comparing deflate_level vs deflate_nocrc shows almost no difference for either library at large sizes — adler32 cost is tiny relative to deflate work.
  • Small input overhead: zlib-rs shows 44-147% overhead at small sizes (1 B - 1 KB), indicating higher per-call initialization cost.
  • Inflate: zlib-ng is 10-38% faster, settling at a consistent ~28% advantage for inputs >= 1 KB.
  • Uncompress: zlib-ng is 22-532% faster. The extreme gap at small sizes (1 B = 45 ns vs 286 ns) indicates significant fixed overhead in zlib-rs's uncompress() wrapper (inflate stream init/teardown).

Raw compare.py Output — compress/inflate/uncompress

Comparing zlibng_bench.json to zlibrs_bench.json
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
compress_bench/compress_bench/1_median                          +1.4670         +1.4722          1889          4659          1880          4649
compress_bench/compress_bench/16_median                         +1.2723         +1.2813          2192          4981          2179          4971
compress_bench/compress_bench/48_median                         +1.1670         +1.1518          2561          5550          2556          5499
compress_bench/compress_bench/256_median                        +0.9736         +0.9723          3135          6186          3126          6166
compress_bench/compress_bench/1024_median                       +0.8322         +0.8274          4581          8394          4567          8345
compress_bench/compress_bench/4096_median                       +0.4387         +0.4394         15138         21779         15107         21746
compress_bench/compress_bench/16384_median                      +0.3357         +0.3355         52454         70061         52346         69908
compress_bench/compress_bench/65536_median                      +0.3525         +0.3556        145404        196663        144795        196281
inflate_bench/inflate_nocrc/1_median                            +0.3943         +0.3814            19            27            19            26
inflate_bench/inflate_nocrc/64_median                           +0.0980         +0.0984           135           148           135           148
inflate_bench/inflate_nocrc/1024_median                         +0.3180         +0.2996           290           383           290           377
inflate_bench/inflate_nocrc/16384_median                        +0.2779         +0.2771          3872          4947          3862          4932
inflate_bench/inflate_nocrc/131072_median                       +0.2895         +0.2929         15153         19540         15087         19507
inflate_bench/inflate_nocrc/1048576_median                      +0.2848         +0.2837        106209        136460        106000        136069
uncompress_bench/uncompress_bench/1_median                      +5.3163         +5.3233            46           290            45           286
uncompress_bench/uncompress_bench/64_median                     +1.4147         +1.4154           161           388           160           387
uncompress_bench/uncompress_bench/1024_median                   +0.8048         +0.8226           347           627           343           626
uncompress_bench/uncompress_bench/16384_median                  +0.2491         +0.2659          4392          5486          4322          5471
uncompress_bench/uncompress_bench/131072_median                 +0.2905         +0.2684         18877         24362         18793         23836
uncompress_bench/uncompress_bench/1048576_median                +0.2162         +0.2174        138830        168842        138435        168531

Raw compare.py Output — deflate parameterized

Comparing zlibng_deflate.json to zlibrs_deflate.json
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
deflate_bench/deflate_level/1024/1_median                       +0.5194         +0.5230          1455          2210          1448          2205
deflate_bench/deflate_level/1024/3_median                       +0.6578         +0.6592          3906          6476          3895          6463
deflate_bench/deflate_level/1024/6_median                       +0.6277         +0.6273          4104          6681          4096          6666
deflate_bench/deflate_level/1024/9_median                       +0.5608         +0.5618          5135          8014          5120          7997
deflate_bench/deflate_level/16384/1_median                      +0.1115         +0.1319          8500          9447          8328          9426
deflate_bench/deflate_level/16384/3_median                      +0.1758         +0.2119         26153         30751         25325         30692
deflate_bench/deflate_level/16384/6_median                      +0.2273         +0.2392         53558         65732         52660         65257
deflate_bench/deflate_level/16384/9_median                      +0.1598         +0.1709         88647        102814         87626        102599
deflate_bench/deflate_level/131072/1_median                     +0.1961         +0.1964         62627         74908         62493         74768
deflate_bench/deflate_level/131072/3_median                     +0.2513         +0.2503        128281        160521        128075        160128
deflate_bench/deflate_level/131072/6_median                     +0.3381         +0.3377        262497        351250        262047        350534
deflate_bench/deflate_level/131072/9_median                     +0.1619         +0.1619        766597        890746        765375        889292
deflate_bench/deflate_level/1048576/1_median                    +0.1914         +0.1904        537273        640116        536404        638547
deflate_bench/deflate_level/1048576/3_median                    +0.2439         +0.2430        959589       1193676        958295       1191189
deflate_bench/deflate_level/1048576/6_median                    +0.3095         +0.3346       2025331       2652264       1980172       2642689
deflate_bench/deflate_level/1048576/9_median                    +0.1876         +0.1875       6217519       7383755       6206186       7370042
deflate_bench/deflate_nocrc/1024/1_median                       +0.4406         +0.4398          1525          2197          1522          2192
deflate_bench/deflate_nocrc/1024/3_median                       +0.5421         +0.5892          4123          6359          3993          6345
deflate_bench/deflate_nocrc/1024/6_median                       +0.6023         +0.6040          4140          6633          4127          6619
deflate_bench/deflate_nocrc/1024/9_median                       +0.5022         +0.5014          5282          7935          5273          7918
deflate_bench/deflate_nocrc/16384/1_median                      +0.1319         +0.1228          7980          9033          7956          8933
deflate_bench/deflate_nocrc/16384/3_median                      +0.2458         +0.2453         24232         30188         24192         30128
deflate_bench/deflate_nocrc/16384/6_median                      +0.2576         +0.2582         52380         65872         52242         65732
deflate_bench/deflate_nocrc/16384/9_median                      +0.1745         +0.1756         89276        104851         89010        104638
deflate_bench/deflate_nocrc/131072/1_median                     +0.1422         +0.1471         63215         72202         62799         72036
deflate_bench/deflate_nocrc/131072/3_median                     +0.2156         +0.2199        128640        156375        127949        156085
deflate_bench/deflate_nocrc/131072/6_median                     +0.3260         +0.3278        260967        346045        260205        345498
deflate_bench/deflate_nocrc/131072/9_median                     +0.1862         +0.1821        772465        916270        771221        911661
deflate_bench/deflate_nocrc/1048576/1_median                    +0.1542         +0.1545        531526        613469        530487        612449
deflate_bench/deflate_nocrc/1048576/3_median                    +0.2202         +0.2216        961992       1173863        958855       1171367
deflate_bench/deflate_nocrc/1048576/6_median                    +0.3172         +0.3232       1991909       2623659       1973975       2611918
deflate_bench/deflate_nocrc/1048576/9_median                    +0.1893         +0.1896       6179062       7348699       6164965       7333845

Reproduction Steps

Strategy

Both libraries export a C-compatible zlib API. The approach is to use zlib-ng's own Google Benchmark harness (which benchmarks compress(), uncompress(), inflate(), and deflate() via the standard zlib API) and link it against each library separately:

  1. zlib-ng benchmarks: Build zlib-ng with ZLIB_COMPAT=ON so it exports standard zlib symbols (compress, uncompress, inflate, etc.). The benchmark binary links statically against libz-ng-static.a.

  2. zlib-rs benchmarks: Build zlib-rs as a static C library (libz_rs.a) via the libz-rs-sys-cdylib crate. Then build a subset of zlib-ng's benchmarks (only the public API tests: compress, uncompress, inflate, deflate) linked against libz_rs.a instead. A small CMake addition creates a benchmark_zlib_rs target for this. The BUILD_ALT=1 define skips zlib-ng's CPU feature detection in benchmark_main.cc.

  3. Run sequentially: Never run benchmarks concurrently — run one, then the other.

  4. Compare: Use Google Benchmark's compare.py tool to produce a side-by-side comparison.

Step-by-Step

1. Clone both repositories

git clone https://github.com/zlib-ng/zlib-ng.git
cd zlib-ng

# Clone zlib-rs alongside
git clone https://github.com/trifectatechfoundation/zlib-rs.git ../zlib-rs

2. Install Rust (if not already installed)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

3. Build zlib-rs static library

cd ../zlib-rs/libz-rs-sys-cdylib
cargo build --release
# Produces: target/release/libz_rs.a
cd ../../zlib-ng

4. Patch zlib-ng CMake to add zlib-rs benchmark target

Add the following to test/benchmarks/CMakeLists.txt, before the if(WITH_BENCHMARK_APPS) line:

if(DEFINED ZLIB_RS_LIB)
    add_executable(benchmark_zlib_rs
        benchmark_compress.cc
        benchmark_deflate.cc
        benchmark_inflate.cc
        benchmark_uncompress.cc
        benchmark_main.cc
    )
    target_compile_definitions(benchmark_zlib_rs PRIVATE -DBENCHMARK_STATIC_DEFINE BUILD_ALT=1 ZLIB_COMPAT)
    target_include_directories(benchmark_zlib_rs PRIVATE
        ${PROJECT_SOURCE_DIR}
        ${PROJECT_BINARY_DIR}
        ${benchmark_SOURCE_DIR}/benchmark/include)
    target_link_libraries(benchmark_zlib_rs ${ZLIB_RS_LIB} benchmark::benchmark)
endif()

5. Build zlib-ng with benchmarks

cmake -S . -B build-bench-zlibng \
    -DZLIB_COMPAT=ON \
    -DBUILD_SHARED_LIBS=OFF \
    -DBUILD_TESTING=ON \
    -DWITH_BENCHMARKS=ON \
    -DCMAKE_BUILD_TYPE=Release

cmake --build build-bench-zlibng -j$(nproc)

6. Build zlib-rs benchmark target

ZLIB_RS_PATH=$(realpath ../zlib-rs/libz-rs-sys-cdylib/target/release/libz_rs.a)

cmake -S . -B build-bench-zlibrs \
    -DZLIB_COMPAT=ON \
    -DBUILD_SHARED_LIBS=OFF \
    -DBUILD_TESTING=ON \
    -DWITH_BENCHMARKS=ON \
    -DCMAKE_BUILD_TYPE=Release \
    -DZLIB_RS_LIB="$ZLIB_RS_PATH"

cmake --build build-bench-zlibrs --target benchmark_zlib_rs -j$(nproc)

7. Run benchmarks sequentially

# Run zlib-ng first (public API benchmarks only)
build-bench-zlibng/test/benchmarks/benchmark_zlib \
    --benchmark_filter="compress_bench|inflate_bench|uncompress_bench|deflate_bench" \
    --benchmark_out=/tmp/zlibng_bench.json \
    --benchmark_out_format=json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

# Then run zlib-rs (do NOT run concurrently)
build-bench-zlibrs/test/benchmarks/benchmark_zlib_rs \
    --benchmark_out=/tmp/zlibrs_bench.json \
    --benchmark_out_format=json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

8. Compare results

# Clone Google Benchmark for the comparison tool
git clone https://github.com/google/benchmark.git .benchmark
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r .benchmark/tools/requirements.txt

# Run comparison
python3 .benchmark/tools/compare.py benchmarks \
    /tmp/zlibng_bench.json \
    /tmp/zlibrs_bench.json

Notes

  • The zlib-ng benchmark harness also includes micro-benchmarks for internal functions (adler32, crc32, compare256, slide_hash, insert_string) that test architecture-specific SIMD variants. These are not comparable with zlib-rs and are excluded from this comparison.
  • The BUILD_ALT=1 compile definition in the zlib-rs target disables zlib-ng's runtime CPU feature detection in benchmark_main.cc, which is not needed when linking against zlib-rs.
  • Both libraries are built with release/optimized settings.
  • The deflate benchmarks use deflateReset() between iterations to measure steady-state compression without init/teardown overhead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment