mattip · April 5, 2026 20:08
diff --git a/gistfile1.txt b/gistfile1.txt
 Summarize the differences between pypy's benchmarks and CPython's:

  PyPy benchmark suite vs pyperformance

  Framework

  All pyperformance benchmarks use pyperf.Runner which handles warmup automatically (discards early iterations) and runs multiple processes. The PyPy suite uses the old unladen_swallow util.run_benchmark
  with no warmup support.

  Methodology fixes in pyperformance

  sqlalchemy_imperative — PyPy version accumulates rows every iteration (n new persons inserted, then 100 selects on a growing table). Pyperformance version deletes all rows before each timed iteration, then
   inserts a fixed --rows (default 100) people and runs npeople selects. Fixed workload per iteration.

  sqlalchemy_declarative — Same accumulation bug. Pyperformance version deletes all rows before each iteration via session.query(Person).delete(synchronize_session=False). Fixed workload per iteration.

  sqlite_synth — PyPy version also accumulates: it inserts 300,000 rows into a table that persists across iterations (the conn is shared across the timed loop). Pyperformance version creates a fresh
  in-memory connection per iteration, inserts loops rows, does the assertions, deletes all rows, and closes the connection. Fully isolated per iteration.

  Benchmarks in pyperformance but not in PyPy suite

  - async_tree / async_tree_io / async_tree_cpu_io_mixed / async_tree_memoization — asyncio workloads, entirely absent
  - logging_simple / logging_format / logging_silent — logging overhead
  - xml_etree_parse / xml_etree_iterparse / xml_etree_generate / xml_etree_process — XML
  - 2to3, docutils, tornado_http — real-world tool benchmarks
  - deepcopy, pathlib — stdlib operations
  - python_startup, python_startup_nosite, hg_startup — startup time
  - regex_dna — additional regex variant
  - Granular splits: json_dumps/json_loads (PyPy has combined json_bench), 7 pickle variants (PyPy has 1), 5 scimark sub-benchmarks (PyPy has 1), 4 sympy sub-benchmarks (PyPy has 1)

  Benchmarks in PyPy suite but dropped from pyperformance

  - gcbench, tuple_gc_hell — GC stress tests (still interesting for PyPy)
  - bm_threading — threading performance
  - fib, pystone, bm_call_simple — legacy micro-benchmarks
  - bm_krakatau, schulze, eparse, bm_icbd — niche/PyPy-specific
  - bm_rietveld, bm_spambayes — bitrotted large-app benchmarks
  - pyaes — duplicate of crypto_pyaes
 6:24 AM
	Summarize the differences between pypy's benchmarks and CPython's:

	PyPy benchmark suite vs pyperformance

	Framework

	All pyperformance benchmarks use pyperf.Runner which handles warmup automatically (discards early iterations) and runs multiple processes. The PyPy suite uses the old unladen_swallow util.run_benchmark
	with no warmup support.

	Methodology fixes in pyperformance

	sqlalchemy_imperative — PyPy version accumulates rows every iteration (n new persons inserted, then 100 selects on a growing table). Pyperformance version deletes all rows before each timed iteration, then
	inserts a fixed --rows (default 100) people and runs npeople selects. Fixed workload per iteration.

	sqlalchemy_declarative — Same accumulation bug. Pyperformance version deletes all rows before each iteration via session.query(Person).delete(synchronize_session=False). Fixed workload per iteration.

	sqlite_synth — PyPy version also accumulates: it inserts 300,000 rows into a table that persists across iterations (the conn is shared across the timed loop). Pyperformance version creates a fresh
	in-memory connection per iteration, inserts loops rows, does the assertions, deletes all rows, and closes the connection. Fully isolated per iteration.

	Benchmarks in pyperformance but not in PyPy suite

	- async_tree / async_tree_io / async_tree_cpu_io_mixed / async_tree_memoization — asyncio workloads, entirely absent
	- logging_simple / logging_format / logging_silent — logging overhead
	- xml_etree_parse / xml_etree_iterparse / xml_etree_generate / xml_etree_process — XML
	- 2to3, docutils, tornado_http — real-world tool benchmarks
	- deepcopy, pathlib — stdlib operations
	- python_startup, python_startup_nosite, hg_startup — startup time
	- regex_dna — additional regex variant
	- Granular splits: json_dumps/json_loads (PyPy has combined json_bench), 7 pickle variants (PyPy has 1), 5 scimark sub-benchmarks (PyPy has 1), 4 sympy sub-benchmarks (PyPy has 1)

	Benchmarks in PyPy suite but dropped from pyperformance

	- gcbench, tuple_gc_hell — GC stress tests (still interesting for PyPy)
	- bm_threading — threading performance
	- fib, pystone, bm_call_simple — legacy micro-benchmarks
	- bm_krakatau, schulze, eparse, bm_icbd — niche/PyPy-specific
	- bm_rietveld, bm_spambayes — bitrotted large-app benchmarks
	- pyaes — duplicate of crypto_pyaes
	6:24 AM
No results found