Notes from a real debugging session chasing a register-corruption bug in inline-asm coroutine scheduling on Apple Silicon. The bug itself isn't the point — these are the tool habits and mental models that let me actually find it.
otool -tvV <binary> dumps Mach-O disassembly with symbols and PC
addresses. To extract a single function:
otool -tvV ./mybinary | awk '/^_myfunc:/{p=1} p{print; c++; if(c>200)exit}'When you read the output, don't try to understand every instruction — watch which base register each load/store uses. On ARM64:
[sp, #N]— sp-relative. Moves if anything touches sp.[x29, #N]— frame-pointer-relative. Stable as long as x29 isn't clobbered.[x0, #N]etc. — arbitrary base in a register. Stable only as long as that register holds the same value.
The base register tells you what needs to be stable for this load to be
correct. If you see mov sp, xN followed later by ldr w0, [sp, #0xb0],
you're looking at an sp-relative load that was addressed relative to the
old sp but executes against the new sp. That single pattern is often
the entire bug.
Setting breakpoints by raw address fails silently when ASLR slides the image to an unpredictable load address. Don't do:
(lldb) breakpoint set -a 0x100000c20 # will silently never fire
Do use symbolic forms that lldb can relocate:
(lldb) breakpoint set -n some_function # by symbol name
(lldb) breakpoint set -f some_file.c -l 99 # by source line
(lldb) breakpoint set -n some_function --address-expression "+ 0x3c"
Source-line breakpoints are particularly good because they survive optimization: lldb maps them to the best-fit instruction even when code has been reordered or inlined.
If your breakpoint "isn't firing" and the process reaches the code path anyway (printed output, abort message, etc.), suspect ASLR before anything else.
On a -O1 or higher build, parameters and short-lived locals often show
as <variable not available> because they've been promoted to registers
that got reused. But frame variable still prints locals that the
compiler had to spill to the stack — and those print with their
real values, because lldb reads them from memory using the DWARF
location info.
(lldb) frame variable
(int[2]) ch1 = ([0] = 0, [1] = 1)
(int[2]) ch18 = ([0] = 32, [1] = 33)
(int) hndl10 = <variable not available>
When your theory says "this local should hold value X" and you need to
verify it on an optimized build, frame variable is usually more useful
than register read because it tells you what the source-level value
is, not just raw register contents. Combine them: frame variable
tells you what the locals hold, register read tells you what's being
passed to the next call.
On optimized code, --condition "$wN != 0" and --ignore-count N both
fail in surprising ways. They'll silently skip the hit you wanted, or
evaluate against registers that have already been reused for something
else.
Cleaner approach: pick a source line that only executes on the path you care about, and let the breakpoint fire naturally. If you're debugging a "sometimes it fails" case, find the failure handler in the source (e.g., the line right after the failing check, or the assert itself) and break there. That way the breakpoint is self-filtering.
This is the most important habit of all, and the one I violate most often.
When you have a theory about what a register holds or what a memory location contains, do not reason about it from first principles. Put a breakpoint at the exact instruction address your theory concerns, read the register or memory, and compare to what your theory predicts.
A single register read x0 x19 at the right PC is worth more than an
hour of reasoning about clang's cost model or the calling convention.
If your theory is correct, you lose thirty seconds. If it's wrong, you
save yourself from going miles down the wrong path.
The bug in this session was a register corruption caused by the compiler
using sp-relative addressing for a local that needed to survive a stack
switch. I initially diagnosed it as ASan-specific — I had a plausible
theory involving padded frame layouts and and sp, sp, #... alignment
masks, and I was confident in it.
The person I was pairing with said: "I didn't think we fixed [that other bug]."
That single sentence — a casual expression of doubt, not even a counter- theory — is what made me actually test my hypothesis with an experiment instead of sitting on it. The experiment ruled out my confident-but-wrong theory in about two minutes, and set up the disassembly work that found the real bug.
Lesson: when someone (even yourself, in your own head) expresses doubt about a diagnosis you're confident in, the correct response is almost never to re-explain why you're right. It's to design a cheap experiment that would falsify the diagnosis and run it. If you're right, you've spent two minutes getting stronger evidence. If you're wrong, you've saved yourself from building an elaborate fix on a wrong foundation.
Confidence in debugging is a red flag, not a virtue.
These notes were distilled from a real debugging session on libdill's ARM64 inline-asm coroutine scheduler. Full credit to the collaborator whose pushback shaped the investigation.