This is a small harness used to confirm — and later verify the fix for — a performance problem in Zino 2's legacy API server: existing clients (curitz, anything built on zinolib) were taking many seconds to fetch all open cases from production Zino 2, whereas the same clients are near-instant against the legacy Tcl Zino on the same network.
This is a snapshot of the tools used during one investigation. It is not part of the Zino repo and has no maintainer. If you revive it later, expect it to need touch-ups against the current Zino source.
- Investigation date: May 2026
- Validated against: Zino 2 commit
3d19876(master at the time) - Fix PR: (link once filed)
The legacy Zino protocol is "chatty": to fetch one case the client sends three
commands sequentially — GETATTRS, GETHIST, GETLOG — each waiting for the
server's reply before sending the next. For 67 open cases that's ~200 round
trips on a single TCP connection.
The Zino 2 server was answering each command by calling transport.write()
once per line of output. A typical GETATTRS response has 10–15 lines, so
the server was emitting 10–15 tiny TCP segments per command. That's where the
two TCP optimizations get in each other's way:
- Nagle's algorithm (on by default) tells the kernel to hold back small outgoing segments if there's already an un-ACK'd small segment in flight, hoping to coalesce them. So after the server sends line 1, the kernel waits for the client's ACK before sending line 2.
- TCP delayed-ACK (on by default on the client side) tells the kernel "don't bother ACK'ing immediately if the application hasn't sent any data back yet — wait up to 40 ms in case there's a piggyback opportunity".
Put those two together on a chatty request/response protocol and every multi-segment server response stalls for ~40 ms waiting for the client's delayed ACK before the next segment can go out. Multiply by 200 round trips and you get the multi-second symptom.
The legacy Tcl server doesn't have this problem. Its accepted channels are
configured with fconfigure -buffering line -blocking false, which parks
puts data in a userspace buffer; on a non-blocking channel, the actual
write(2) only happens when the Tcl event loop sees the channel writable,
which coalesces a whole response's lines into one syscall and therefore one
TCP segment. The client ACKs that single segment immediately (no delayed-ACK
trigger), and Nagle never engages.
The fix to Zino 2 was to replicate that pattern in asyncio. This harness exists to (a) prove the diagnosis and (b) measure the fix.
The natural first instinct is to run the client against zino on 127.0.0.1
and time it. This does not reproduce the problem. On localhost, ACKs come
back in microseconds, so Nagle has nothing to hold back and delayed-ACK is
moot. A direct loopback run finishes in tens of milliseconds even with the
pathological write pattern.
To trigger the bug locally without root (i.e., without tcpdump/tc qdisc)
we route the client through a small Python TCP forwarder running on the same
host. Just having a userspace hop in the path adds enough scheduling jitter
between recv-on-one-fd and send-on-the-other that the kernel can no longer
ACK back instantly. That tiny delay is enough for Nagle to start gating
subsequent small writes and the delayed-ACK timer to start engaging — the
exact pathology that real WAN clients see.
The proxy can also optionally inject artificial one-way delay, so we can simulate a 30 ms WAN link as well.
| File | Role |
|---|---|
make_fixture.py |
Generates an mkdtemp zino fixture: zino.toml, polldevs.cf, secrets, and a zino-state.json with N open events of mixed types. |
client.py |
Blocking Zino-1 protocol client that mirrors zinolib's send-then-recv-to-. loop. Times each command. --via-zinolib switches to the real zinolib client. |
latency_proxy.py |
Local TCP forwarder on 127.0.0.1:9001 → 127.0.0.1:8001. Optionally --delay-ms N per chunk, optionally --log to record every recv chunk's size and timestamp. |
run_bench.py |
Orchestrator. Builds the fixture, spawns uv run zino against it, optionally starts the proxy, runs the client, tears everything down. |
The headline table at the bottom refers to "Scenario B". Here's what all three mean:
client ──── 127.0.0.1:8001 ──── zino
Run run_bench.py --num-events 100 with no proxy flags. This is the "looks
fine!" sanity check that misled the initial investigation. Expected to
complete in well under 100 ms regardless of whether the fix is in place.
Confirms the client and fixture work, but tells you nothing about the bug.
client ──── 127.0.0.1:9001 ──── proxy ──── 127.0.0.1:8001 ──── zino
Run run_bench.py --num-events 100 --proxy-delay-ms 0. Same machine, no
artificial latency. The proxy just forwards bytes. This is where the bug
shows up, because the userspace hop disrupts the timing enough for the
Nagle/delayed-ACK interaction to activate.
This is the core demonstration:
- Without the fix: ~7 seconds total,
GETATTRSmean ~41 ms (every sample pinned to the Linux delayed-ACK timer) - With the fix: ~0.1 seconds total,
GETATTRSmean ~0.3 ms
client ──── 127.0.0.1:9001 ──── [+15 ms] ──── proxy ──── [+15 ms] ──── zino
Run run_bench.py --num-events 100 --proxy-delay-ms 15. Simulates what a
real WAN client experiences. Useful for understanding what the fix does and
does not address:
- Without the fix: ~17 seconds (~30 ms RTT + ~40 ms delayed-ACK stall per command)
- With the fix: ~9 seconds. The 40 ms stalls are gone, but each round
trip still costs one RTT. The remaining time is pure
N round trips × RTTand can only be reduced by changing the protocol (pipelining, batch commands) — a separate, larger problem.
latency_proxy.py --log <path> writes one line per recv() chunk —
direction, byte count, time since the previous chunk. This shows directly
whether the server is emitting per-line writes or coalesced ones.
uv run python run_bench.py --num-events 100 --proxy-delay-ms 0 \
--proxy-log /tmp/zino-proxy.log
# Server-to-client chunk size distribution:
awk -F'\t' '$1=="s2c"{print $2}' /tmp/zino-proxy.log | sort -n | uniq -cPre-fix you'll see lots of 3-, 4-, 8-byte chunks (one TCP segment per protocol line). Post-fix every chunk is 80+ bytes — one whole response per chunk.
- A working Zino 2 dev environment in the same checkout
(
uv pip install --group dev), with ports8001/8002free - The fixture sets
snmp.backend = "pysnmp"so the netsnmp MIB load doesn't inflate startup; if you only have netsnmp installed, editZINO_TOMLinmake_fixture.py - Optional:
pip install zinolibif you want the--via-zinolibpath
Place the scripts in any directory inside the Zino repo (so uv run
resolves the zino package), then:
# Scenario A — sanity check.
uv run python run_bench.py --num-events 100
# Scenario B — the core demonstration.
uv run python run_bench.py --num-events 100 --proxy-delay-ms 0
# Scenario C — simulated WAN.
uv run python run_bench.py --num-events 100 --proxy-delay-ms 15To reproduce the pre/post comparison, run scenario B, git stash the fix,
re-run, then git stash pop.
| Metric | Pre-fix | Post-fix |
|---|---|---|
| Total fetch (100 cases) | 7.07 s | 0.091 s |
GETATTRS mean |
41.5 ms | 0.34 ms |
GETATTRS distribution |
every sample on the 40 ms delayed-ACK timer | smooth, sub-ms |
| Server→client TCP chunks | ~1220 (mostly tiny) | 304 (one full response each) |
- The 40 ms peak is
TCP_DELACK_MIN, a Linux kernel constant. On macOS/BSD the absolute number shifts but the shape of the result (pre-fix per-line writes stall, post-fix coalesced writes don't) reproduces anywhere. - The proxy is load-bearing, not optional. See "Why we need a TCP proxy" above. If you skip it, the bug doesn't show up and you might wrongly conclude there's nothing wrong.
- It's not a regression test. No fixture pinning, no CI integration. If a
future Zino refactor renames
_respond_rawor changes the auth handshake, the harness will silently break. - Port conflict. The orchestrator binds
8001/8002. Stop any local zino before running.