- CPU: Apple M3 (8 cores)
- RAM: 24 GB
- OS: Darwin 24.6.0 arm64 (macOS Sequoia)
- Compiler: Apple clang 17.0.0 (clang-1700.6.3.2)
- Rust: rustc 1.93.1 (01f6ddf75 2026-02-11)
- zlib-ng:
54352daf(develop branch) — "Make extra length/distance bits computation branchless using bit masking" - zlib-rs:
bb25b662(main branch, v0.6.2) — "fix compilation errors for 'cargo test --all-features'"
Benchmarks run with 5 repetitions each. Median CPU time values shown below.
Uses compress() one-shot API at default compression level (6).
| Input Size | zlib-ng (ns) | zlib-rs (ns) | Difference |
|---|---|---|---|
| 1 B | 1,880 | 4,649 | +147% |
| 16 B | 2,179 | 4,971 | +128% |
| 48 B | 2,556 | 5,499 | +115% |
| 256 B | 3,126 | 6,166 | +97% |
| 1 KB | 4,567 | 8,345 | +83% |
| 4 KB | 15,107 | 21,746 | +44% |
| 16 KB | 52,346 | 69,908 | +34% |
| 64 KB | 144,795 | 196,281 | +36% |
Uses streaming deflate() API with zlib wrapping (includes adler32 checksum). Parameterized by input size and compression level. Uses deflateReset() between iterations to avoid measuring init/teardown.
| Input Size | Level | zlib-ng (ns) | zlib-rs (ns) | Difference |
|---|---|---|---|---|
| 1 KB | 1 | 1,448 | 2,205 | +52% |
| 1 KB | 3 | 3,895 | 6,463 | +66% |
| 1 KB | 6 | 4,096 | 6,666 | +63% |
| 1 KB | 9 | 5,120 | 7,997 | +56% |
| 16 KB | 1 | 8,328 | 9,426 | +13% |
| 16 KB | 3 | 25,325 | 30,692 | +21% |
| 16 KB | 6 | 52,660 | 65,257 | +24% |
| 16 KB | 9 | 87,626 | 102,599 | +17% |
| 128 KB | 1 | 62,493 | 74,768 | +20% |
| 128 KB | 3 | 128,075 | 160,128 | +25% |
| 128 KB | 6 | 262,047 | 350,534 | +34% |
| 128 KB | 9 | 765,375 | 889,292 | +16% |
| 1 MB | 1 | 536,404 | 638,547 | +19% |
| 1 MB | 3 | 958,295 | 1,191,189 | +24% |
| 1 MB | 6 | 1,980,172 | 2,642,689 | +33% |
| 1 MB | 9 | 6,206,186 | 7,370,042 | +19% |
Same as above but using raw deflate (window bits = -15), which skips adler32 checksum computation.
| Input Size | Level | zlib-ng (ns) | zlib-rs (ns) | Difference |
|---|---|---|---|---|
| 1 KB | 1 | 1,522 | 2,192 | +44% |
| 1 KB | 3 | 3,993 | 6,345 | +59% |
| 1 KB | 6 | 4,127 | 6,619 | +60% |
| 1 KB | 9 | 5,273 | 7,918 | +50% |
| 16 KB | 1 | 7,956 | 8,933 | +12% |
| 16 KB | 3 | 24,192 | 30,128 | +25% |
| 16 KB | 6 | 52,242 | 65,732 | +26% |
| 16 KB | 9 | 89,010 | 104,638 | +18% |
| 128 KB | 1 | 62,799 | 72,036 | +15% |
| 128 KB | 3 | 127,949 | 156,085 | +22% |
| 128 KB | 6 | 260,205 | 345,498 | +33% |
| 128 KB | 9 | 771,221 | 911,661 | +18% |
| 1 MB | 1 | 530,487 | 612,449 | +15% |
| 1 MB | 3 | 958,855 | 1,171,367 | +22% |
| 1 MB | 6 | 1,973,975 | 2,611,918 | +32% |
| 1 MB | 9 | 6,164,965 | 7,333,845 | +19% |
Uses streaming inflate() API with raw deflate (window bits = -15, no adler32). Data pre-compressed at level 9. Uses inflateReset() between iterations.
| Input Size | zlib-ng (ns) | zlib-rs (ns) | Difference |
|---|---|---|---|
| 1 B | 19.1 | 26.3 | +38% |
| 64 B | 135 | 148 | +10% |
| 1 KB | 290 | 377 | +30% |
| 16 KB | 3,862 | 4,932 | +28% |
| 128 KB | 15,087 | 19,507 | +29% |
| 1 MB | 106,000 | 136,069 | +28% |
Uses uncompress() one-shot API. Data pre-compressed at level 9.
| Input Size | zlib-ng (ns) | zlib-rs (ns) | Difference |
|---|---|---|---|
| 1 B | 45.3 | 286 | +532% |
| 64 B | 160 | 387 | +142% |
| 1 KB | 343 | 626 | +83% |
| 16 KB | 4,322 | 5,471 | +27% |
| 128 KB | 18,793 | 23,836 | +27% |
| 1 MB | 138,435 | 168,531 | +22% |
- Compression (all levels): zlib-ng is consistently faster across all compression levels and sizes.
- Level 6 (default) shows the widest gap at ~33% for large inputs — this is the most commonly used compression level.
- Level 9 (best compression) has the smallest gap at ~17-19% for large inputs — the exhaustive match search dominates and both implementations do similar work.
- Level 1 (fastest) gap is ~15-20% for large inputs — the simpler fast path leaves less room for optimization differences.
- Checksum overhead is negligible: Comparing
deflate_levelvsdeflate_nocrcshows almost no difference for either library at large sizes — adler32 cost is tiny relative to deflate work. - Small input overhead: zlib-rs shows 44-147% overhead at small sizes (1 B - 1 KB), indicating higher per-call initialization cost.
- Inflate: zlib-ng is 10-38% faster, settling at a consistent ~28% advantage for inputs >= 1 KB.
- Uncompress: zlib-ng is 22-532% faster. The extreme gap at small sizes (1 B = 45 ns vs 286 ns) indicates significant fixed overhead in zlib-rs's
uncompress()wrapper (inflate stream init/teardown).
Comparing zlibng_bench.json to zlibrs_bench.json
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
compress_bench/compress_bench/1_median +1.4670 +1.4722 1889 4659 1880 4649
compress_bench/compress_bench/16_median +1.2723 +1.2813 2192 4981 2179 4971
compress_bench/compress_bench/48_median +1.1670 +1.1518 2561 5550 2556 5499
compress_bench/compress_bench/256_median +0.9736 +0.9723 3135 6186 3126 6166
compress_bench/compress_bench/1024_median +0.8322 +0.8274 4581 8394 4567 8345
compress_bench/compress_bench/4096_median +0.4387 +0.4394 15138 21779 15107 21746
compress_bench/compress_bench/16384_median +0.3357 +0.3355 52454 70061 52346 69908
compress_bench/compress_bench/65536_median +0.3525 +0.3556 145404 196663 144795 196281
inflate_bench/inflate_nocrc/1_median +0.3943 +0.3814 19 27 19 26
inflate_bench/inflate_nocrc/64_median +0.0980 +0.0984 135 148 135 148
inflate_bench/inflate_nocrc/1024_median +0.3180 +0.2996 290 383 290 377
inflate_bench/inflate_nocrc/16384_median +0.2779 +0.2771 3872 4947 3862 4932
inflate_bench/inflate_nocrc/131072_median +0.2895 +0.2929 15153 19540 15087 19507
inflate_bench/inflate_nocrc/1048576_median +0.2848 +0.2837 106209 136460 106000 136069
uncompress_bench/uncompress_bench/1_median +5.3163 +5.3233 46 290 45 286
uncompress_bench/uncompress_bench/64_median +1.4147 +1.4154 161 388 160 387
uncompress_bench/uncompress_bench/1024_median +0.8048 +0.8226 347 627 343 626
uncompress_bench/uncompress_bench/16384_median +0.2491 +0.2659 4392 5486 4322 5471
uncompress_bench/uncompress_bench/131072_median +0.2905 +0.2684 18877 24362 18793 23836
uncompress_bench/uncompress_bench/1048576_median +0.2162 +0.2174 138830 168842 138435 168531
Comparing zlibng_deflate.json to zlibrs_deflate.json
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
deflate_bench/deflate_level/1024/1_median +0.5194 +0.5230 1455 2210 1448 2205
deflate_bench/deflate_level/1024/3_median +0.6578 +0.6592 3906 6476 3895 6463
deflate_bench/deflate_level/1024/6_median +0.6277 +0.6273 4104 6681 4096 6666
deflate_bench/deflate_level/1024/9_median +0.5608 +0.5618 5135 8014 5120 7997
deflate_bench/deflate_level/16384/1_median +0.1115 +0.1319 8500 9447 8328 9426
deflate_bench/deflate_level/16384/3_median +0.1758 +0.2119 26153 30751 25325 30692
deflate_bench/deflate_level/16384/6_median +0.2273 +0.2392 53558 65732 52660 65257
deflate_bench/deflate_level/16384/9_median +0.1598 +0.1709 88647 102814 87626 102599
deflate_bench/deflate_level/131072/1_median +0.1961 +0.1964 62627 74908 62493 74768
deflate_bench/deflate_level/131072/3_median +0.2513 +0.2503 128281 160521 128075 160128
deflate_bench/deflate_level/131072/6_median +0.3381 +0.3377 262497 351250 262047 350534
deflate_bench/deflate_level/131072/9_median +0.1619 +0.1619 766597 890746 765375 889292
deflate_bench/deflate_level/1048576/1_median +0.1914 +0.1904 537273 640116 536404 638547
deflate_bench/deflate_level/1048576/3_median +0.2439 +0.2430 959589 1193676 958295 1191189
deflate_bench/deflate_level/1048576/6_median +0.3095 +0.3346 2025331 2652264 1980172 2642689
deflate_bench/deflate_level/1048576/9_median +0.1876 +0.1875 6217519 7383755 6206186 7370042
deflate_bench/deflate_nocrc/1024/1_median +0.4406 +0.4398 1525 2197 1522 2192
deflate_bench/deflate_nocrc/1024/3_median +0.5421 +0.5892 4123 6359 3993 6345
deflate_bench/deflate_nocrc/1024/6_median +0.6023 +0.6040 4140 6633 4127 6619
deflate_bench/deflate_nocrc/1024/9_median +0.5022 +0.5014 5282 7935 5273 7918
deflate_bench/deflate_nocrc/16384/1_median +0.1319 +0.1228 7980 9033 7956 8933
deflate_bench/deflate_nocrc/16384/3_median +0.2458 +0.2453 24232 30188 24192 30128
deflate_bench/deflate_nocrc/16384/6_median +0.2576 +0.2582 52380 65872 52242 65732
deflate_bench/deflate_nocrc/16384/9_median +0.1745 +0.1756 89276 104851 89010 104638
deflate_bench/deflate_nocrc/131072/1_median +0.1422 +0.1471 63215 72202 62799 72036
deflate_bench/deflate_nocrc/131072/3_median +0.2156 +0.2199 128640 156375 127949 156085
deflate_bench/deflate_nocrc/131072/6_median +0.3260 +0.3278 260967 346045 260205 345498
deflate_bench/deflate_nocrc/131072/9_median +0.1862 +0.1821 772465 916270 771221 911661
deflate_bench/deflate_nocrc/1048576/1_median +0.1542 +0.1545 531526 613469 530487 612449
deflate_bench/deflate_nocrc/1048576/3_median +0.2202 +0.2216 961992 1173863 958855 1171367
deflate_bench/deflate_nocrc/1048576/6_median +0.3172 +0.3232 1991909 2623659 1973975 2611918
deflate_bench/deflate_nocrc/1048576/9_median +0.1893 +0.1896 6179062 7348699 6164965 7333845
Both libraries export a C-compatible zlib API. The approach is to use zlib-ng's own Google Benchmark harness (which benchmarks compress(), uncompress(), inflate(), and deflate() via the standard zlib API) and link it against each library separately:
-
zlib-ng benchmarks: Build zlib-ng with
ZLIB_COMPAT=ONso it exports standard zlib symbols (compress,uncompress,inflate, etc.). The benchmark binary links statically againstlibz-ng-static.a. -
zlib-rs benchmarks: Build zlib-rs as a static C library (
libz_rs.a) via thelibz-rs-sys-cdylibcrate. Then build a subset of zlib-ng's benchmarks (only the public API tests: compress, uncompress, inflate, deflate) linked againstlibz_rs.ainstead. A small CMake addition creates abenchmark_zlib_rstarget for this. TheBUILD_ALT=1define skips zlib-ng's CPU feature detection inbenchmark_main.cc. -
Run sequentially: Never run benchmarks concurrently — run one, then the other.
-
Compare: Use Google Benchmark's
compare.pytool to produce a side-by-side comparison.
git clone https://github.com/zlib-ng/zlib-ng.git
cd zlib-ng
# Clone zlib-rs alongside
git clone https://github.com/trifectatechfoundation/zlib-rs.git ../zlib-rscurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"cd ../zlib-rs/libz-rs-sys-cdylib
cargo build --release
# Produces: target/release/libz_rs.a
cd ../../zlib-ngAdd the following to test/benchmarks/CMakeLists.txt, before the if(WITH_BENCHMARK_APPS) line:
if(DEFINED ZLIB_RS_LIB)
add_executable(benchmark_zlib_rs
benchmark_compress.cc
benchmark_deflate.cc
benchmark_inflate.cc
benchmark_uncompress.cc
benchmark_main.cc
)
target_compile_definitions(benchmark_zlib_rs PRIVATE -DBENCHMARK_STATIC_DEFINE BUILD_ALT=1 ZLIB_COMPAT)
target_include_directories(benchmark_zlib_rs PRIVATE
${PROJECT_SOURCE_DIR}
${PROJECT_BINARY_DIR}
${benchmark_SOURCE_DIR}/benchmark/include)
target_link_libraries(benchmark_zlib_rs ${ZLIB_RS_LIB} benchmark::benchmark)
endif()cmake -S . -B build-bench-zlibng \
-DZLIB_COMPAT=ON \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_TESTING=ON \
-DWITH_BENCHMARKS=ON \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-bench-zlibng -j$(nproc)ZLIB_RS_PATH=$(realpath ../zlib-rs/libz-rs-sys-cdylib/target/release/libz_rs.a)
cmake -S . -B build-bench-zlibrs \
-DZLIB_COMPAT=ON \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_TESTING=ON \
-DWITH_BENCHMARKS=ON \
-DCMAKE_BUILD_TYPE=Release \
-DZLIB_RS_LIB="$ZLIB_RS_PATH"
cmake --build build-bench-zlibrs --target benchmark_zlib_rs -j$(nproc)# Run zlib-ng first (public API benchmarks only)
build-bench-zlibng/test/benchmarks/benchmark_zlib \
--benchmark_filter="compress_bench|inflate_bench|uncompress_bench|deflate_bench" \
--benchmark_out=/tmp/zlibng_bench.json \
--benchmark_out_format=json \
--benchmark_repetitions=5 \
--benchmark_report_aggregates_only=true
# Then run zlib-rs (do NOT run concurrently)
build-bench-zlibrs/test/benchmarks/benchmark_zlib_rs \
--benchmark_out=/tmp/zlibrs_bench.json \
--benchmark_out_format=json \
--benchmark_repetitions=5 \
--benchmark_report_aggregates_only=true# Clone Google Benchmark for the comparison tool
git clone https://github.com/google/benchmark.git .benchmark
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r .benchmark/tools/requirements.txt
# Run comparison
python3 .benchmark/tools/compare.py benchmarks \
/tmp/zlibng_bench.json \
/tmp/zlibrs_bench.json- The zlib-ng benchmark harness also includes micro-benchmarks for internal functions (adler32, crc32, compare256, slide_hash, insert_string) that test architecture-specific SIMD variants. These are not comparable with zlib-rs and are excluded from this comparison.
- The
BUILD_ALT=1compile definition in the zlib-rs target disables zlib-ng's runtime CPU feature detection inbenchmark_main.cc, which is not needed when linking against zlib-rs. - Both libraries are built with release/optimized settings.
- The deflate benchmarks use
deflateReset()between iterations to measure steady-state compression without init/teardown overhead.