Debug

Debug
- Methodologies
- CPUs
  - perf
  - execsnoop
  - exitsnoop
  - runqlat
  - runqlen
  - runqslower
  - cpudist
  - cpufreq
  - profile
  - offcputime
  - syscount
  - argdist and trace
  - funccount
  - softirqs
  - hardirqs
  - smpcalls
  - llcstat
- Memory
- Filesystem
  - Traditional tools (TODO)
  - Opensnoop
- Disks
  - biolatency
  - biosnoop

Methodologies

Workload characterization

Who is causing the load (e.g., PID, process name, UID, IP address)?
Why is the load called (code path, stack trace, flame graph)?
What is the load (IOPS, throughput, type)?
How is the load changing over time (per-interval summaries)?

Drill-down

Start examining the highest level.
Examine next-level details.
Pick the most interesting breakdown or clue.
If the problem is unsolved, go back to step 2.

USE Method

For every resource, check:

Utilization
Saturation
Errors

Hardware targets for USE method analysis.

Checklist

60s analysis

uptime
dmesg | tail
vmstat 1

r: The number of processes running on CPU and waiting for a turn. This provides a better signal than load averages for determining CPU saturation, as it does not include I/O. To interpret: an "r" value greater than the CPU count indicates saturation.

free: Free memory, in Kbytes. If there are too many digits to count, you probably have enough free memory. The free -m command, included in Section 3.3.7 better explains the state of free memory.

si and so: Swap-ins and swap-outs. If these are non-zero, you’re out of memory. These are only in use if swap devices are configured.

us, sy, id, wa, and st: These are breakdowns of CPU time, on average, across all CPUs. They are user time, system time (kernel), idle, wait I/O, and stolen time (by other guests, or, with Xen, the guest’s own isolated driver domain).
mpstat -P ALL 1

This command prints per-CPU time broken down into states. The output reveals a problem: CPU 0 has hit 100% user time, evidence of a single-thread bottleneck.

Also look out for high %iowait time, which can be explored with disk I/O tools, and high %sys time, which can be explored with syscall and kernel tracing, as well as CPU profiling
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top

kubectl debug -n ${NAMESPACE} -it ${POD} --image=debian:10.9-slim --target=${CONTAINER}
apt update && apt install -y htop sysstat procps

BCC tool checklist

execsnoop
opensnoop
ext4slower (or btrfs*, xfs*, zfs*)
biolatency
biosnoop
cachestat
tcpconnect
tcpaccept
tcpretrans
runqlat
profile

CPUs

perf

We can use flamegraph directly from flamegraph-rs.

E.g.:

flamegraph -o /tmp/timer_flamegraph.svg timer 10s

execsnoop

execsnoop(8)3 is a BCC and bpftrace tool that traces new process execution system-wide. It can find issues of short-lived processes that consume CPU resources and can also be used to debug software execution, including application start scripts.

exitsnoop

exitsnoop(8)6 is a BCC tool that traces when processes exit, showing their age and exit reason. The age is the time from process creation to termination, and includes time both on and off CPU. Like execsnoop(8), exitsnoop(8) can help debug issues of short-lived processes, providing different information to help understand this type of workload.

runqlat

runqlat(8)8 is a BCC and bpftrace tool for measuring CPU scheduler latency, often called run queue latency (even when no longer implemented using run queues). It is useful for identifying and quantifying issues of CPU saturation, where there is more demand for CPU resources than they can service. The metric measured by runqlat(8) is the time each thread (task) spends waiting for its turn on CPU.

runqlen

runqlen(8)11 is a BCC and bpftrace tool for sampling the length of the CPU run queues, counting how many tasks are waiting their turn, and presenting this as a linear histogram. This can be used to further characterize issues of run queue latency or as a cheaper approximation.

runqslower

runqslower(8)12 is a BCC tool that lists instances of run queue latency exceeding a configurable threshold and shows the process that suffered the latency and its duration.

cpudist

cpudist(8)13 is a BCC tool for showing the distribution of on-CPU time for each thread wakeup. This can be used to help characterize CPU workloads, providing details for later tuning and design decisions.

cpufreq

cpufreq(8)14 samples the CPU frequency and shows it as a system-wide histogram, with per-process name histograms. This only works for CPU scaling governors that change the frequency, such as powersave, and can be used to determine the clock speed at which your applications are running.

profile

Flame graphs are visualizations of stack traces that can help you quickly understand profile(8) output. They were introduced in Chapter 2.

To support flame graphs, profile(8) can produce output in folded format using -f: Stack traces are printed on one line, with functions separated by semicolons. For example, writing a 30-second profile to an out.stacks01 file and including kernel annotations (-a):

cd /tmp
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph

profile -af 30 > out.stacks01
./flamegraph.pl --color=java < out.stacks01 > out.svg

offcputime

offcputime(8)16 is a BCC and bpftrace tool to summarize time spent by threads blocked and off CPU, showing stack traces to explain why. For CPU analysis, this tool explains why threads are not running on a CPU. It’s a counterpart to profile(8); between them, they show the entire time spent by threads on the system: on-CPU time with profile(8) and off-CPU time with offcputime(8).

cd /tmp
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph

offcputime -fKu 5 > out.offcputime01.txt
./flamegraph.pl --hash --bgcolors=blue --title="Off-CPU Time Flame Graph" \
    < out.offcputime01.txt > out.offcputime01.svg

syscount

syscount(8)19 is a BCC and bpftrace tool for counting system calls system-wide. It is included in this chapter because it can be a starting point for investigating cases of high system CPU time.

argdist and trace

argdist(8) and trace(8) are introduced in Chapter 4, and are BCC tools that can examine events in custom ways. As a follow-on from syscount(8), if a syscall was found to be called frequently, you can use these tools to examine it in more detail.

For example, the read(2) syscall was frequent in the previous syscount(8) output. You can use argdist(8) to summarize its arguments and return value by instrumenting either the syscall tracepoint or its kernel functions. For the tracepoint, you need to find the argument names, which the BCC tool tplist(8) prints out with the -v option:

# tplist -v syscalls:sys_enter_read
syscalls:sys_enter_read
    int __syscall_nr;
    unsigned int fd;
    char * buf;
    size_t count;

argdist -H 't:syscalls:sys_enter_read():int:args->count'

funccount

funccount(8), introduced in Chapter 4, is a BCC tool that can frequency-count functions and other events. It can be used to provide more context for software CPU usage, showing which functions are called and how frequently. profile(8) may be able to show that a function is hot on CPU, but it can’t explain why20: whether the function is slow, or whether it was simply called millions of times per second.

softirqs

softirqs(8) is a BCC tool that shows the time spent servicing soft IRQs (soft interrupts). The system-wide time in soft interrupts is readily available from different tools. For example, mpstat(1) shows it as %soft. There is also /proc/softirqs to show counts of soft IRQ events. The BCC softirqs(8) tool differs in that it can show time per soft IRQ rather than event count.

hardirqs

hardirqs(8)21 is a BCC tool that shows time spent servicing hard IRQs (hard interrupts). The system-wide time in hard interrupts is readily available from different tools. For example, mpstat(1) shows it as %irq. There is also /proc/interrupts to show counts of hard IRQ events. The BCC hardirqs(8) tool differs in that it can show time per hard IRQ rather than event count.

smpcalls

smpcalls(8)22 is a bpftrace tool to trace and summarize time in the SMP call functions (also known as cross calls). These are a way for one CPU to run functions on other CPUs, including all other CPUs, which can become an expensive activity on large multi-processor systems. For example, on a 36-CPU system:

llcstat

llcstat(8)23 is a BCC tool that uses PMCs to show last-level cache (LLC) miss rates and hit ratios by process. PMCs are introduced in Chapter 2.

Memory

Traditional tools

Tool	Type	Description
dmesg	Kernel log	OOM killer event details
swapon	Kernel statistics	Swap device usage
free	Kernel statistics	System-wide memory usage
ps	Kernel statistics	Process statistics, including memory usage*
pmap	Kernel statistics	Process memory usage by segment
vmstat	Kernel statistics	Various statistics, including memory
sar	Kernel statistics	Can show page fault and page scanner rates*
perf	Software events, hardware statistics, hardware sampling	Memory-related PMC statistics and event sampling

*ps output:

%MEM: The percentage of the system’s physical memory in use by this process
VSZ: Virtual memory size
RSS: Resident set size: the total physical memory in use by this process

*sar: sar -B 1

Profiler (TODO)

heaptrack: memory profiler for Linux

Page faults

bpftrace -e 'software:major-faults:1 { @[comm, pid] = count(); }'

bpftrace -e 'software:major-faults:1 { printf("%s %s\n", comm, str(pid)); }'

Filesystem

Traditional tools (TODO)

Tool	Type	Description
df
mount
strace
perf
fatrace		fatrace(1) is a specialized tracer that uses the Linux fanotify API (file access notify)

Opensnoop

opensnoop(8)4 was shown in Chapters 1 and 4, and is provided by BCC and bpftrace. It traces file opens and is useful for discovering the location of data files, log files, and configuration files. It can also discover performance problems caused by frequent opens, or help troubleshoot issues caused by missing files.

Disks

biolatency

biolatency(8)2 is a BCC and bpftrace tool to show block I/O device latency as a histogram. The term device latency refers to the time from issuing a request to the device, to when it completes, including time spent queued in the operating system.

biosnoop

biosnoop(8)3 is a BCC and bpftrace tool that prints a one-line summary for each disk I/O.

pando85/debug.md