Skip to content

Instantly share code, notes, and snippets.

@mcai4gl2
Created February 4, 2026 13:52
Show Gist options
  • Select an option

  • Save mcai4gl2/d3e2517393e58ca3e56544c9de42d862 to your computer and use it in GitHub Desktop.

Select an option

Save mcai4gl2/d3e2517393e58ca3e56544c9de42d862 to your computer and use it in GitHub Desktop.

TCP Send-Q Trace

tcp_sendq_trace.py monitors per-socket TCP send buffer utilization over time using the Linux ss command. It produces a CSV time series that lets you identify which client connections are slow (not reading data), causing the server's kernel send buffer to back up.

How It Works

  1. Runs ss -tinm state established at a configurable interval to capture all established TCP connections.
  2. Parses each connection's Send-Q (bytes queued in the kernel send buffer waiting to be ACKed or read by the remote side) and snd_buf (tb field from skmem) which is the kernel's current send buffer size.
  3. Computes utilization as Send-Q / snd_buf (0.0 = idle, 1.0 = buffer full).
  4. Writes timestamped rows to a CSV file for offline analysis or graphing.

Prerequisites

  • Linux with iproute2 installed (provides the ss command)
  • No elevated privileges required (reads from /proc/net/tcp which is world-readable)

Usage

# Default: sample every 100ms for 10 seconds, write to sendq.csv
python scripts/tcp_sendq_trace.py

# Custom interval and duration
python scripts/tcp_sendq_trace.py --interval 0.5 --duration 60 --out /tmp/trace.csv

# Only capture rows where Send-Q > 0 (reduces noise)
python scripts/tcp_sendq_trace.py --only-nonzero --duration 30

# Skip reverse DNS resolution of peer IPs (faster, no DNS dependency)
python scripts/tcp_sendq_trace.py --no-resolve --duration 30

# Only trace specific clients (by IP or hostname, one per line)
python scripts/tcp_sendq_trace.py --filter-file clients.txt --duration 30

# Only trace specific local (server) ports
python scripts/tcp_sendq_trace.py --port-filter-file ports.txt --duration 30

# Combine both filters (AND logic)
python scripts/tcp_sendq_trace.py --filter-file clients.txt --port-filter-file ports.txt --duration 30

CLI Options

Flag Default Description
--interval 0.1 Sampling interval in seconds
--duration 10.0 Total capture duration in seconds
--out sendq.csv Output CSV file path
--only-nonzero off Only record rows where Send-Q > 0
--no-resolve off Skip reverse DNS resolution of peer IPs
--filter-file none File with peer IPs/hostnames to include (one per line)
--port-filter-file none File with local ports to include (one per line)

Output Format

The CSV contains the following columns:

Column Description
timestamp Unix epoch with microsecond precision
local Local address:port
peer Remote address:port
peer_name Reverse DNS hostname of peer IP (empty if unresolvable or --no-resolve)
send_q Bytes queued in kernel send buffer
snd_buf Kernel send buffer size (from skmem tb field)
util Utilization ratio (send_q / snd_buf, 6 decimal places)

Example output:

timestamp,local,peer,peer_name,send_q,snd_buf,util
1706900000.123456,10.0.0.1:8080,10.0.0.5:12345,slow-client.lan,2621440,2626560,0.998052
1706900000.123456,10.0.0.1:8080,10.0.0.6:22222,fast-client.lan,0,46080,0.000000
1706900000.223456,10.0.0.1:8080,10.0.0.5:12345,slow-client.lan,2621440,2626560,0.998052

Filter File Formats

Both --filter-file and --port-filter-file use the same format: one entry per line, with # comments and blank lines ignored.

Peer filter (--filter-file): each line is an IP address or DNS hostname. DNS names are resolved to IPs at startup.

# clients.txt — watch list
10.0.0.5
192.168.1.100
slow-client.example.com
problematic-server.lan

Port filter (--port-filter-file): each line is an integer port number.

# ports.txt — server ports to monitor
8080
443
5000

When both filters are provided, they compose with AND logic: a connection must match both the port filter and the peer filter to appear in the output.

Interpreting Results

  • util near 0: The client is reading data promptly. Healthy connection.
  • util near 1.0 or above: The client is not reading (or reading too slowly). The kernel send buffer is full. This is a "bad" or slow client.
  • send_q consistently > 0 across multiple samples: Indicates a sustained backlog, not just a transient burst.

Example: Finding Slow Clients

# Capture 30 seconds of data, only non-zero queues
python scripts/tcp_sendq_trace.py --duration 30 --only-nonzero --out slow_clients.csv

# Then analyze with standard tools
# Top offenders by average Send-Q:
awk -F, 'NR>1 {sum[$3]+=$4; n[$3]++} END {for(p in sum) printf "%s avg_sendq=%.0f\n", p, sum[p]/n[p]}' slow_clients.csv | sort -t= -k2 -rn
#!/usr/bin/env python3
import argparse
import csv
import re
import socket
import subprocess
import time
# ss with:
# -t tcp
# -i tcp internal info (not required, but fine)
# -n numeric
# -m socket memory (gives skmem:(...,tb<SNDBUF>,...))
SS_CMD = ["ss", "-tinm", "state", "established"]
# Example main line (two formats depending on ss version / filter):
# With state column: ESTAB 0 4096 10.0.0.1:5000 10.0.0.2:53422
# Without state column: 0 4096 10.0.0.1:5000 10.0.0.2:53422
# The state column is omitted when ss filters by a single state (e.g. "state established").
LINE_RE = re.compile(r"^(?:ESTAB\s+)?(\d+)\s+(\d+)\s+(\S+)\s+(\S+)")
# Example skmem line:
# skmem:(r0,rb131072,t0,tb87040,f0,w0,o0,bl0,d0)
# We want tb<digits>
TB_RE = re.compile(r"\btb(\d+)\b")
_dns_cache: dict[str, str] = {}
def load_port_filter_file(path: str) -> set[int]:
"""Load a local-port filter file.
Each non-empty, non-comment line must be an integer port number.
Returns a set of port ints.
"""
ports: set[int] = set()
with open(path) as f:
for raw in f:
entry = raw.strip()
if not entry or entry.startswith("#"):
continue
ports.add(int(entry))
return ports
def load_filter_file(path: str) -> set[str]:
"""Load a peer-IP filter file.
Each non-empty, non-comment line is either an IP address or a DNS name.
DNS names are resolved to IPs via ``socket.getaddrinfo`` so that runtime
matching is always by IP. Returns a set of IP strings.
"""
ips: set[str] = set()
with open(path) as f:
for raw in f:
entry = raw.strip()
if not entry or entry.startswith("#"):
continue
# Try to treat it as a numeric IP first (fast path).
try:
socket.inet_pton(socket.AF_INET, entry)
ips.add(entry)
continue
except OSError:
pass
try:
socket.inet_pton(socket.AF_INET6, entry)
ips.add(entry)
continue
except OSError:
pass
# Not a numeric IP — resolve the hostname.
try:
infos = socket.getaddrinfo(entry, None, socket.AF_UNSPEC, socket.SOCK_STREAM)
for _fam, _typ, _proto, _canon, sockaddr in infos:
ips.add(sockaddr[0])
except socket.gaierror:
pass # unresolvable name — silently skip
return ips
def _extract_ip(addr_port: str) -> str:
"""Extract the IP from an ``addr:port`` string.
Handles IPv4 (``10.0.0.1:5000``), bracket IPv6 (``[::1]:8080``),
and bare IPv6 (``::1:8080`` — last colon is port separator).
"""
if addr_port.startswith("["):
# [::1]:8080 -> ::1
return addr_port[1:addr_port.index("]")]
# IPv4 or non-bracket: last colon separates port
return addr_port.rsplit(":", 1)[0]
def _extract_port(addr_port: str) -> int:
"""Extract the port number from an ``addr:port`` string."""
return int(addr_port.rsplit(":", 1)[1])
def resolve_ip(ip: str) -> str:
"""Reverse-DNS lookup with caching. Returns hostname or ``""``."""
if ip in _dns_cache:
return _dns_cache[ip]
old_timeout = socket.getdefaulttimeout()
try:
socket.setdefaulttimeout(2)
hostname = socket.gethostbyaddr(ip)[0]
except (socket.herror, socket.gaierror, OSError):
hostname = ""
finally:
socket.setdefaulttimeout(old_timeout)
_dns_cache[ip] = hostname
return hostname
def run_ss() -> str:
return subprocess.check_output(SS_CMD, text=True, stderr=subprocess.DEVNULL)
def parse_samples(ss_text: str, resolve: bool = True):
"""
Returns list of dict:
{local, peer, peer_name, send_q, snd_buf, util}
"""
lines = ss_text.splitlines()
out = []
i = 0
while i < len(lines):
line = lines[i].strip()
m = LINE_RE.match(line)
if not m:
i += 1
continue
recv_q = int(m.group(1))
send_q = int(m.group(2))
local = m.group(3)
peer = m.group(4)
# Next lines may include skmem:(...) (usually indented)
snd_buf = None
j = i + 1
# Scan following lines until the next connection line or end
while j < len(lines) and not LINE_RE.match(lines[j].strip()):
t = lines[j]
mtb = TB_RE.search(t)
if mtb:
snd_buf = int(mtb.group(1))
break
j += 1
# Utilization
util = None
if snd_buf and snd_buf > 0:
util = send_q / snd_buf
peer_name = resolve_ip(_extract_ip(peer)) if resolve else ""
out.append({
"local": local,
"peer": peer,
"peer_name": peer_name,
"recv_q": recv_q,
"send_q": send_q,
"snd_buf": snd_buf if snd_buf is not None else "",
"util": util if util is not None else "",
})
i = j # continue after scanned block
return out
def main():
ap = argparse.ArgumentParser(description="Sample per-socket TCP Send-Q + snd_buf(tb) over time")
ap.add_argument("--interval", type=float, default=0.1, help="Sampling interval in seconds")
ap.add_argument("--duration", type=float, default=10.0, help="Total duration in seconds")
ap.add_argument("--out", default="sendq.csv", help="Output CSV file")
ap.add_argument("--only-nonzero", action="store_true",
help="Only output rows where Send-Q > 0 (reduces volume)")
ap.add_argument("--no-resolve", action="store_true",
help="Skip reverse DNS resolution of peer IPs")
ap.add_argument("--filter-file", default=None,
help="File with peer IPs/hostnames to include (one per line)")
ap.add_argument("--port-filter-file", default=None,
help="File with local ports to include (one per line)")
args = ap.parse_args()
filter_ips: set[str] | None = None
if args.filter_file:
filter_ips = load_filter_file(args.filter_file)
filter_ports: set[int] | None = None
if args.port_filter_file:
filter_ports = load_port_filter_file(args.port_filter_file)
end_time = time.time() + args.duration
with open(args.out, "w", newline="") as f:
w = csv.writer(f)
w.writerow(["timestamp", "local", "peer", "peer_name", "send_q", "snd_buf", "util"])
while time.time() < end_time:
ts = time.time()
ss_text = run_ss()
samples = parse_samples(ss_text, resolve=not args.no_resolve)
for s in samples:
if filter_ports is not None:
local_port = _extract_port(s["local"])
if local_port not in filter_ports:
continue
if filter_ips is not None:
peer_ip = _extract_ip(s["peer"])
if peer_ip not in filter_ips:
continue
if args.only_nonzero and int(s["send_q"]) == 0:
continue
util = s["util"]
# write util as decimal ratio (0..1+) with 6dp if present
util_str = f"{util:.6f}" if isinstance(util, float) else ""
w.writerow([f"{ts:.6f}", s["local"], s["peer"], s["peer_name"], s["send_q"], s["snd_buf"], util_str])
time.sleep(args.interval)
print(f"Wrote time series to {args.out}")
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment