zlib-ng vs zlib-rs Benchmark Comparison (ARM64, Apple M3)

Machine Specs

CPU: Apple M3 (8 cores)
RAM: 24 GB
OS: Darwin 24.6.0 arm64 (macOS Sequoia)
Compiler: Apple clang 17.0.0 (clang-1700.6.3.2)
Rust: rustc 1.93.1 (01f6ddf75 2026-02-11)

Versions Tested

zlib-ng: 54352daf (develop branch) — "Make extra length/distance bits computation branchless using bit masking"
zlib-rs: bb25b662 (main branch, v0.6.2) — "fix compilation errors for 'cargo test --all-features'"

Results

Benchmarks run with 5 repetitions each. Median CPU time values shown below.

Compress (`compress()`)

Uses compress() one-shot API at default compression level (6).

Input Size	zlib-ng (ns)	zlib-rs (ns)	Difference
1 B	1,880	4,649	+147%
16 B	2,179	4,971	+128%
48 B	2,556	5,499	+115%
256 B	3,126	6,166	+97%
1 KB	4,567	8,345	+83%
4 KB	15,107	21,746	+44%
16 KB	52,346	69,908	+34%
64 KB	144,795	196,281	+36%

Deflate Streaming with Checksum (`deflateInit2` + `deflate`, zlib format)

Uses streaming deflate() API with zlib wrapping (includes adler32 checksum). Parameterized by input size and compression level. Uses deflateReset() between iterations to avoid measuring init/teardown.

Input Size	Level	zlib-ng (ns)	zlib-rs (ns)	Difference
1 KB	1	1,448	2,205	+52%
1 KB	3	3,895	6,463	+66%
1 KB	6	4,096	6,666	+63%
1 KB	9	5,120	7,997	+56%
16 KB	1	8,328	9,426	+13%
16 KB	3	25,325	30,692	+21%
16 KB	6	52,660	65,257	+24%
16 KB	9	87,626	102,599	+17%
128 KB	1	62,493	74,768	+20%
128 KB	3	128,075	160,128	+25%
128 KB	6	262,047	350,534	+34%
128 KB	9	765,375	889,292	+16%
1 MB	1	536,404	638,547	+19%
1 MB	3	958,295	1,191,189	+24%
1 MB	6	1,980,172	2,642,689	+33%
1 MB	9	6,206,186	7,370,042	+19%

Raw Deflate, No Checksum (`deflateInit2` with `-MAX_WBITS`)

Same as above but using raw deflate (window bits = -15), which skips adler32 checksum computation.

Input Size	Level	zlib-ng (ns)	zlib-rs (ns)	Difference
1 KB	1	1,522	2,192	+44%
1 KB	3	3,993	6,345	+59%
1 KB	6	4,127	6,619	+60%
1 KB	9	5,273	7,918	+50%
16 KB	1	7,956	8,933	+12%
16 KB	3	24,192	30,128	+25%
16 KB	6	52,242	65,732	+26%
16 KB	9	89,010	104,638	+18%
128 KB	1	62,799	72,036	+15%
128 KB	3	127,949	156,085	+22%
128 KB	6	260,205	345,498	+33%
128 KB	9	771,221	911,661	+18%
1 MB	1	530,487	612,449	+15%
1 MB	3	958,855	1,171,367	+22%
1 MB	6	1,973,975	2,611,918	+32%
1 MB	9	6,164,965	7,333,845	+19%

Inflate (raw deflate, no checksum)

Uses streaming inflate() API with raw deflate (window bits = -15, no adler32). Data pre-compressed at level 9. Uses inflateReset() between iterations.

Input Size	zlib-ng (ns)	zlib-rs (ns)	Difference
1 B	19.1	26.3	+38%
64 B	135	148	+10%
1 KB	290	377	+30%
16 KB	3,862	4,932	+28%
128 KB	15,087	19,507	+29%
1 MB	106,000	136,069	+28%

Uncompress (`uncompress()`)

Uses uncompress() one-shot API. Data pre-compressed at level 9.

Input Size	zlib-ng (ns)	zlib-rs (ns)	Difference
1 B	45.3	286	+532%
64 B	160	387	+142%
1 KB	343	626	+83%
16 KB	4,322	5,471	+27%
128 KB	18,793	23,836	+27%
1 MB	138,435	168,531	+22%

Key Takeaways

Compression (all levels): zlib-ng is consistently faster across all compression levels and sizes.
Level 6 (default) shows the widest gap at ~33% for large inputs — this is the most commonly used compression level.
Level 9 (best compression) has the smallest gap at ~17-19% for large inputs — the exhaustive match search dominates and both implementations do similar work.
Level 1 (fastest) gap is ~15-20% for large inputs — the simpler fast path leaves less room for optimization differences.
Checksum overhead is negligible: Comparing deflate_level vs deflate_nocrc shows almost no difference for either library at large sizes — adler32 cost is tiny relative to deflate work.
Small input overhead: zlib-rs shows 44-147% overhead at small sizes (1 B - 1 KB), indicating higher per-call initialization cost.
Inflate: zlib-ng is 10-38% faster, settling at a consistent ~28% advantage for inputs >= 1 KB.
Uncompress: zlib-ng is 22-532% faster. The extreme gap at small sizes (1 B = 45 ns vs 286 ns) indicates significant fixed overhead in zlib-rs's uncompress() wrapper (inflate stream init/teardown).

Raw compare.py Output — compress/inflate/uncompress

Comparing zlibng_bench.json to zlibrs_bench.json
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
compress_bench/compress_bench/1_median                          +1.4670         +1.4722          1889          4659          1880          4649
compress_bench/compress_bench/16_median                         +1.2723         +1.2813          2192          4981          2179          4971
compress_bench/compress_bench/48_median                         +1.1670         +1.1518          2561          5550          2556          5499
compress_bench/compress_bench/256_median                        +0.9736         +0.9723          3135          6186          3126          6166
compress_bench/compress_bench/1024_median                       +0.8322         +0.8274          4581          8394          4567          8345
compress_bench/compress_bench/4096_median                       +0.4387         +0.4394         15138         21779         15107         21746
compress_bench/compress_bench/16384_median                      +0.3357         +0.3355         52454         70061         52346         69908
compress_bench/compress_bench/65536_median                      +0.3525         +0.3556        145404        196663        144795        196281
inflate_bench/inflate_nocrc/1_median                            +0.3943         +0.3814            19            27            19            26
inflate_bench/inflate_nocrc/64_median                           +0.0980         +0.0984           135           148           135           148
inflate_bench/inflate_nocrc/1024_median                         +0.3180         +0.2996           290           383           290           377
inflate_bench/inflate_nocrc/16384_median                        +0.2779         +0.2771          3872          4947          3862          4932
inflate_bench/inflate_nocrc/131072_median                       +0.2895         +0.2929         15153         19540         15087         19507
inflate_bench/inflate_nocrc/1048576_median                      +0.2848         +0.2837        106209        136460        106000        136069
uncompress_bench/uncompress_bench/1_median                      +5.3163         +5.3233            46           290            45           286
uncompress_bench/uncompress_bench/64_median                     +1.4147         +1.4154           161           388           160           387
uncompress_bench/uncompress_bench/1024_median                   +0.8048         +0.8226           347           627           343           626
uncompress_bench/uncompress_bench/16384_median                  +0.2491         +0.2659          4392          5486          4322          5471
uncompress_bench/uncompress_bench/131072_median                 +0.2905         +0.2684         18877         24362         18793         23836
uncompress_bench/uncompress_bench/1048576_median                +0.2162         +0.2174        138830        168842        138435        168531

Raw compare.py Output — deflate parameterized

Comparing zlibng_deflate.json to zlibrs_deflate.json
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
deflate_bench/deflate_level/1024/1_median                       +0.5194         +0.5230          1455          2210          1448          2205
deflate_bench/deflate_level/1024/3_median                       +0.6578         +0.6592          3906          6476          3895          6463
deflate_bench/deflate_level/1024/6_median                       +0.6277         +0.6273          4104          6681          4096          6666
deflate_bench/deflate_level/1024/9_median                       +0.5608         +0.5618          5135          8014          5120          7997
deflate_bench/deflate_level/16384/1_median                      +0.1115         +0.1319          8500          9447          8328          9426
deflate_bench/deflate_level/16384/3_median                      +0.1758         +0.2119         26153         30751         25325         30692
deflate_bench/deflate_level/16384/6_median                      +0.2273         +0.2392         53558         65732         52660         65257
deflate_bench/deflate_level/16384/9_median                      +0.1598         +0.1709         88647        102814         87626        102599
deflate_bench/deflate_level/131072/1_median                     +0.1961         +0.1964         62627         74908         62493         74768
deflate_bench/deflate_level/131072/3_median                     +0.2513         +0.2503        128281        160521        128075        160128
deflate_bench/deflate_level/131072/6_median                     +0.3381         +0.3377        262497        351250        262047        350534
deflate_bench/deflate_level/131072/9_median                     +0.1619         +0.1619        766597        890746        765375        889292
deflate_bench/deflate_level/1048576/1_median                    +0.1914         +0.1904        537273        640116        536404        638547
deflate_bench/deflate_level/1048576/3_median                    +0.2439         +0.2430        959589       1193676        958295       1191189
deflate_bench/deflate_level/1048576/6_median                    +0.3095         +0.3346       2025331       2652264       1980172       2642689
deflate_bench/deflate_level/1048576/9_median                    +0.1876         +0.1875       6217519       7383755       6206186       7370042
deflate_bench/deflate_nocrc/1024/1_median                       +0.4406         +0.4398          1525          2197          1522          2192
deflate_bench/deflate_nocrc/1024/3_median                       +0.5421         +0.5892          4123          6359          3993          6345
deflate_bench/deflate_nocrc/1024/6_median                       +0.6023         +0.6040          4140          6633          4127          6619
deflate_bench/deflate_nocrc/1024/9_median                       +0.5022         +0.5014          5282          7935          5273          7918
deflate_bench/deflate_nocrc/16384/1_median                      +0.1319         +0.1228          7980          9033          7956          8933
deflate_bench/deflate_nocrc/16384/3_median                      +0.2458         +0.2453         24232         30188         24192         30128
deflate_bench/deflate_nocrc/16384/6_median                      +0.2576         +0.2582         52380         65872         52242         65732
deflate_bench/deflate_nocrc/16384/9_median                      +0.1745         +0.1756         89276        104851         89010        104638
deflate_bench/deflate_nocrc/131072/1_median                     +0.1422         +0.1471         63215         72202         62799         72036
deflate_bench/deflate_nocrc/131072/3_median                     +0.2156         +0.2199        128640        156375        127949        156085
deflate_bench/deflate_nocrc/131072/6_median                     +0.3260         +0.3278        260967        346045        260205        345498
deflate_bench/deflate_nocrc/131072/9_median                     +0.1862         +0.1821        772465        916270        771221        911661
deflate_bench/deflate_nocrc/1048576/1_median                    +0.1542         +0.1545        531526        613469        530487        612449
deflate_bench/deflate_nocrc/1048576/3_median                    +0.2202         +0.2216        961992       1173863        958855       1171367
deflate_bench/deflate_nocrc/1048576/6_median                    +0.3172         +0.3232       1991909       2623659       1973975       2611918
deflate_bench/deflate_nocrc/1048576/9_median                    +0.1893         +0.1896       6179062       7348699       6164965       7333845

Reproduction Steps

Strategy

Both libraries export a C-compatible zlib API. The approach is to use zlib-ng's own Google Benchmark harness (which benchmarks compress(), uncompress(), inflate(), and deflate() via the standard zlib API) and link it against each library separately:

zlib-ng benchmarks: Build zlib-ng with ZLIB_COMPAT=ON so it exports standard zlib symbols (compress, uncompress, inflate, etc.). The benchmark binary links statically against libz-ng-static.a.
zlib-rs benchmarks: Build zlib-rs as a static C library (libz_rs.a) via the libz-rs-sys-cdylib crate. Then build a subset of zlib-ng's benchmarks (only the public API tests: compress, uncompress, inflate, deflate) linked against libz_rs.a instead. A small CMake addition creates a benchmark_zlib_rs target for this. The BUILD_ALT=1 define skips zlib-ng's CPU feature detection in benchmark_main.cc.
Run sequentially: Never run benchmarks concurrently — run one, then the other.
Compare: Use Google Benchmark's compare.py tool to produce a side-by-side comparison.

Step-by-Step

1. Clone both repositories

git clone https://github.com/zlib-ng/zlib-ng.git
cd zlib-ng

# Clone zlib-rs alongside
git clone https://github.com/trifectatechfoundation/zlib-rs.git ../zlib-rs

2. Install Rust (if not already installed)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

3. Build zlib-rs static library

cd ../zlib-rs/libz-rs-sys-cdylib
cargo build --release
# Produces: target/release/libz_rs.a
cd ../../zlib-ng

4. Patch zlib-ng CMake to add zlib-rs benchmark target

Add the following to test/benchmarks/CMakeLists.txt, before the if(WITH_BENCHMARK_APPS) line:

if(DEFINED ZLIB_RS_LIB)
    add_executable(benchmark_zlib_rs
        benchmark_compress.cc
        benchmark_deflate.cc
        benchmark_inflate.cc
        benchmark_uncompress.cc
        benchmark_main.cc
    )
    target_compile_definitions(benchmark_zlib_rs PRIVATE -DBENCHMARK_STATIC_DEFINE BUILD_ALT=1 ZLIB_COMPAT)
    target_include_directories(benchmark_zlib_rs PRIVATE
        ${PROJECT_SOURCE_DIR}
        ${PROJECT_BINARY_DIR}
        ${benchmark_SOURCE_DIR}/benchmark/include)
    target_link_libraries(benchmark_zlib_rs ${ZLIB_RS_LIB} benchmark::benchmark)
endif()

5. Build zlib-ng with benchmarks

cmake -S . -B build-bench-zlibng \
    -DZLIB_COMPAT=ON \
    -DBUILD_SHARED_LIBS=OFF \
    -DBUILD_TESTING=ON \
    -DWITH_BENCHMARKS=ON \
    -DCMAKE_BUILD_TYPE=Release

cmake --build build-bench-zlibng -j$(nproc)

6. Build zlib-rs benchmark target

ZLIB_RS_PATH=$(realpath ../zlib-rs/libz-rs-sys-cdylib/target/release/libz_rs.a)

cmake -S . -B build-bench-zlibrs \
    -DZLIB_COMPAT=ON \
    -DBUILD_SHARED_LIBS=OFF \
    -DBUILD_TESTING=ON \
    -DWITH_BENCHMARKS=ON \
    -DCMAKE_BUILD_TYPE=Release \
    -DZLIB_RS_LIB="$ZLIB_RS_PATH"

cmake --build build-bench-zlibrs --target benchmark_zlib_rs -j$(nproc)

7. Run benchmarks sequentially

# Run zlib-ng first (public API benchmarks only)
build-bench-zlibng/test/benchmarks/benchmark_zlib \
    --benchmark_filter="compress_bench|inflate_bench|uncompress_bench|deflate_bench" \
    --benchmark_out=/tmp/zlibng_bench.json \
    --benchmark_out_format=json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

# Then run zlib-rs (do NOT run concurrently)
build-bench-zlibrs/test/benchmarks/benchmark_zlib_rs \
    --benchmark_out=/tmp/zlibrs_bench.json \
    --benchmark_out_format=json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

8. Compare results

# Clone Google Benchmark for the comparison tool
git clone https://github.com/google/benchmark.git .benchmark
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r .benchmark/tools/requirements.txt

# Run comparison
python3 .benchmark/tools/compare.py benchmarks \
    /tmp/zlibng_bench.json \
    /tmp/zlibrs_bench.json

Notes

The zlib-ng benchmark harness also includes micro-benchmarks for internal functions (adler32, crc32, compare256, slide_hash, insert_string) that test architecture-specific SIMD variants. These are not comparable with zlib-rs and are excluded from this comparison.
The BUILD_ALT=1 compile definition in the zlib-rs target disables zlib-ng's runtime CPU feature detection in benchmark_main.cc, which is not needed when linking against zlib-rs.
Both libraries are built with release/optimized settings.
The deflate benchmarks use deflateReset() between iterations to measure steady-state compression without init/teardown overhead.

nmoinvaz/zlibng-vs-zlibrs-benchmark.md

Select an option

No results found

Select an option

No results found

zlib-ng vs zlib-rs Benchmark Comparison (ARM64, Apple M3)

Machine Specs

Versions Tested

Results

Compress (`compress()`)

Deflate Streaming with Checksum (`deflateInit2` + `deflate`, zlib format)

Raw Deflate, No Checksum (`deflateInit2` with `-MAX_WBITS`)

Inflate (raw deflate, no checksum)

Uncompress (`uncompress()`)

Key Takeaways

Raw compare.py Output — compress/inflate/uncompress

Raw compare.py Output — deflate parameterized

Reproduction Steps

Strategy

Step-by-Step

1. Clone both repositories

2. Install Rust (if not already installed)

3. Build zlib-rs static library

4. Patch zlib-ng CMake to add zlib-rs benchmark target

5. Build zlib-ng with benchmarks

6. Build zlib-rs benchmark target

7. Run benchmarks sequentially

8. Compare results

Notes

nmoinvaz/zlibng-vs-zlibrs-benchmark.md

zlib-ng vs zlib-rs Benchmark Comparison (ARM64, Apple M3)

Machine Specs

Versions Tested

Results

Compress (compress())

Deflate Streaming with Checksum (deflateInit2 + deflate, zlib format)

Raw Deflate, No Checksum (deflateInit2 with -MAX_WBITS)

Inflate (raw deflate, no checksum)

Uncompress (uncompress())

Key Takeaways

Raw compare.py Output — compress/inflate/uncompress

Raw compare.py Output — deflate parameterized

Reproduction Steps

Strategy

Step-by-Step

1. Clone both repositories

2. Install Rust (if not already installed)

3. Build zlib-rs static library

4. Patch zlib-ng CMake to add zlib-rs benchmark target

5. Build zlib-ng with benchmarks

6. Build zlib-rs benchmark target

7. Run benchmarks sequentially

8. Compare results

Notes

Compress (`compress()`)

Deflate Streaming with Checksum (`deflateInit2` + `deflate`, zlib format)

Raw Deflate, No Checksum (`deflateInit2` with `-MAX_WBITS`)

Uncompress (`uncompress()`)