Skip to content

Instantly share code, notes, and snippets.

@nmoinvaz
Last active February 28, 2026 01:17
Show Gist options
  • Select an option

  • Save nmoinvaz/29e5751db5eceeac55fa714688b95f95 to your computer and use it in GitHub Desktop.

Select an option

Save nmoinvaz/29e5751db5eceeac55fa714688b95f95 to your computer and use it in GitHub Desktop.
zlib-ng CLAUDE.md

Project Basics

  • Use CMake build system.
  • Always check the commits for HEAD and BASE or other branch names as they can change often.
  • To build for other architectures than the current architecture use llvm-clang unless gcc is specified.

Key Directories

  • arch/ - Architecture specific optimizations
  • test/ - Unit tests written using Google Test Framework (gtest_zlib project)
  • test/benchmarks - Performance benchmark testing using Google Benchmark Framework (benchmark_zlib project)

Testing

  • To enable gtest_zlib use -D BUILD_TESTING=ON -D WITH_GTEST=ON.
  • gtest_zlib can be found in the build directory's test directory.

Benchmarking

  • To enable benchmark_zlib use -D BUILD_TESTING=ON -D WITH_BENCHMARKS=ON.
  • Always configure CMake with -D BUILD_SHARED_LIBS=OFF to avoid linking time.
  • Isolate benchmarks runs by configuring and compiling them to separate build directories.
  • benchmark_zlib can be found in the build directory's test/benchmarks directory.
  • Run benchmark_zlib --benchmark_list_tests=true to list all benchmarks.
  • When running benchmark_zlib with --benchmark_repetitions, also use --benchmark_report_aggregates_only=true.
  • Run benchmark processes sequentially, otherwise it could cause contention and unreliable results.

Comparing Branches

  • Use git worktree to check out the contender branch, then configure and build each to separate directories.
  • Run benchmarks with --benchmark_out=<file>.json --benchmark_out_format=json to produce input files for comparison.

Comparing Results

  • Git clone https://github.com/google/benchmark to .benchmark directory.
  • Create .venv virtual environment and install requirements pip3 install -r requirements.txt.
  • Use tools/compare.py benchmarks <benchmark_baseline> <benchmark_contender> [benchmark options]

Publishing Results

  • Create a new GitHub gist with the summary of the results
  • Start the title of the gist and the filename of the gist with the name of the project.
  • Always include the machine specs in the summary

Assembly Analysis

  • Object files can be found in the build directory's CMakeFiles/zlib-ng.dir subdirectory.
  • Use objdump or dumpbin to disassemble, or /arriba:extract-asm to extract a specific function from .o, .obj, or .s files.
  • Always compare assembly before/after to verify an optimization has the intended effect.

When reviewing extracted assembly, check:

  • Instruction count — total instructions, compare before/after.
  • Memory operations — count loads vs stores; flag unnecessary spills to stack.
  • Register pressure — identify stack spills ([sp, #...] on AArch64, (%rsp) / (%rbp) on x86) that indicate the compiler ran out of registers.
  • Branch density — count conditional branches in hot loops; fewer branches = better pipelining.
  • SIMD utilization — check for vector instructions (stp q, movi v on AArch64; vmov, vpadd, vpshuf on x86) vs scalar fallbacks.
  • Call overhead — external calls (bl, call) in hot paths force register saves; prefer inlined operations.
  • Loop structure — identify the back-edge branch and count instructions per iteration.
  • Constant materializationmov immediates or adrp/ldr from constant pools; repeated materialization of the same constant suggests missed CSE.

Optimization Strategies

Source-level techniques — always verify results in assembly for the architecture of interest or at least x86-64 and AArch64:

  • Prefer branchless computation using bit masking when the zero case is a no-op.
  • Look for ways to optimize using bit tricks.
  • Reduce unnecessary casts by looking at where the data is coming from and how it is being used.
  • Keep hot variables in registers across inline function boundaries using locals and pass-by-pointer.
  • Minimize live variables in hot loops to reduce register pressure and avoid stack spills.
  • Audit multi-way branches for unreachable paths.

Coding Standards

  • Use fixed-integer types from stdint.h when possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment