- Use CMake build system.
- Always check the commits for
HEADandBASEor other branch names as they can change often. - To build for other architectures than the current architecture use
llvm-clangunlessgccis specified.
arch/- Architecture specific optimizationstest/- Unit tests written using Google Test Framework (gtest_zlib project)test/benchmarks- Performance benchmark testing using Google Benchmark Framework (benchmark_zlib project)
- To enable gtest_zlib use
-D BUILD_TESTING=ON -D WITH_GTEST=ON. gtest_zlibcan be found in the build directory'stestdirectory.
- To enable benchmark_zlib use
-D BUILD_TESTING=ON -D WITH_BENCHMARKS=ON. - Always configure CMake with
-D BUILD_SHARED_LIBS=OFFto avoid linking time. - Isolate benchmarks runs by configuring and compiling them to separate build directories.
benchmark_zlibcan be found in the build directory'stest/benchmarksdirectory.- Run
benchmark_zlib --benchmark_list_tests=trueto list all benchmarks. - When running
benchmark_zlibwith--benchmark_repetitions, also use--benchmark_report_aggregates_only=true. - Run benchmark processes sequentially, otherwise it could cause contention and unreliable results.
- Use
git worktreeto check out the contender branch, then configure and build each to separate directories. - Run benchmarks with
--benchmark_out=<file>.json --benchmark_out_format=jsonto produce input files for comparison.
- Git clone https://github.com/google/benchmark to
.benchmarkdirectory. - Create
.venvvirtual environment and install requirementspip3 install -r requirements.txt. - Use
tools/compare.py benchmarks <benchmark_baseline> <benchmark_contender> [benchmark options]
- Create a new GitHub gist with the summary of the results
- Start the title of the gist and the filename of the gist with the name of the project.
- Always include the machine specs in the summary
- Object files can be found in the build directory's
CMakeFiles/zlib-ng.dirsubdirectory. - Use
objdumpordumpbinto disassemble, or/arriba:extract-asmto extract a specific function from.o,.obj, or.sfiles. - Always compare assembly before/after to verify an optimization has the intended effect.
When reviewing extracted assembly, check:
- Instruction count — total instructions, compare before/after.
- Memory operations — count loads vs stores; flag unnecessary spills to stack.
- Register pressure — identify stack spills (
[sp, #...]on AArch64,(%rsp)/(%rbp)on x86) that indicate the compiler ran out of registers. - Branch density — count conditional branches in hot loops; fewer branches = better pipelining.
- SIMD utilization — check for vector instructions (
stp q,movi von AArch64;vmov,vpadd,vpshufon x86) vs scalar fallbacks. - Call overhead — external calls (
bl,call) in hot paths force register saves; prefer inlined operations. - Loop structure — identify the back-edge branch and count instructions per iteration.
- Constant materialization —
movimmediates oradrp/ldrfrom constant pools; repeated materialization of the same constant suggests missed CSE.
Source-level techniques — always verify results in assembly for the architecture of interest or at least x86-64 and AArch64:
- Prefer branchless computation using bit masking when the zero case is a no-op.
- Look for ways to optimize using bit tricks.
- Reduce unnecessary casts by looking at where the data is coming from and how it is being used.
- Keep hot variables in registers across inline function boundaries using locals and pass-by-pointer.
- Minimize live variables in hot loops to reduce register pressure and avoid stack spills.
- Audit multi-way branches for unreachable paths.
- Use fixed-integer types from
stdint.hwhen possible.