This directory contains a small tracing workflow for investigating Ollama memory behavior on macOS, especially on Apple Silicon machines with unified memory.
The goal is to capture enough information to answer:
- Which Ollama subprocesses were alive?
- Which model runner each subprocess belonged to?
- Did multiple model runners overlap?
- What was macOS doing globally with wired memory, compressed memory, swap, and memory pressure?
The workflow produces a combined SVG graph from two trace files:
- A per-process trace from
ps. - A system memory trace from
vm_stat,memory_pressure,sysctl vm.swapusage,ollama ps, and/api/ps.
- macOS
- Ollama running locally
bashcurljq- Node.js for graph generation
No npm packages are required.
Recommended files for each measurement run:
ollama-process-memtrace-YYYYMMDD-HHMMSS.tsv- One row per matching process sample.
- Captures PID, PPID, RSS, VSZ, and command line.
system-memtrace-YYYYMMDD-HHMMSS.log- One block per system sample.
- Captures macOS memory counters and Ollama's loaded-model view.
graphs/rss-and-system-memory.svg- Combined graph generated from the two trace files.
generate-memory-graph.js- Script that generates the SVG graph.
Run this in the directory where you want to store the trace files:
mkdir -p ollama-memtrace
cd ollama-memtrace
OUT="ollama-process-memtrace-$(date +%Y%m%d-%H%M%S).tsv"
echo -e "ts\tpid\tppid\tcomm\trss_kb\tvsz_kb\tcommand" > "$OUT"
while true; do
ts=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
ps -axo pid=,ppid=,comm=,rss=,vsz=,command= \
| awk -v ts="$ts" '
/[o]llama|[m]lx|[r]unner/ {
pid=$1; ppid=$2; comm=$3; rss=$4; vsz=$5;
$1=$2=$3=$4=$5="";
sub(/^ +/, "", $0);
print ts "\t" pid "\t" ppid "\t" comm "\t" rss "\t" vsz "\t" $0
}
' >> "$OUT"
sleep 1
doneLeave this running while you reproduce the memory behavior.
Run this in a second terminal:
cd ollama-memtrace
OUT="system-memtrace-$(date +%Y%m%d-%H%M%S).log"
while true; do
{
echo "===== $(date -u +"%Y-%m-%dT%H:%M:%SZ") ====="
vm_stat
memory_pressure
sysctl vm.swapusage
ollama ps
curl -s http://127.0.0.1:11434/api/ps | jq .
echo
} >> "$OUT"
sleep 5
doneLeave this running during the same reproduction window.
In a third terminal, run the model operations you want to investigate.
For example:
ollama run <model> --verbose 'give me a definition for strategy and for tactics. Provide at least 3 examples to differentiate the concepts.'If you are investigating model switching or multiple loaded models, run several models sequentially and keep the tracing commands active until the models have either unloaded or the slow/critical behavior has occurred.
If you use the Ollama macOS app, you do not need to start ollama serve manually. The app launches and manages the serve process; the traces above still capture the serve process and any model runner subprocesses.
When finished, stop both tracing loops with Ctrl-C.
Copy generate-memory-graph.js into the trace directory if it is not already there, then run:
mkdir -p graphs
node generate-memory-graph.js \
ollama-process-memtrace-YYYYMMDD-HHMMSS.tsv \
system-memtrace-YYYYMMDD-HHMMSS.log \
graphs/rss-and-system-memory.svgOpen graphs/rss-and-system-memory.svg in a browser.
The graph has two panels.
The top panel shows per-process RSS:
- Each line is an Ollama-related process.
- Model runners are identified by command lines containing
runner --mlx-engine --model .... - Short-lived placeholder entries shown by
psas(ollama)are omitted because they are usually only a few MiB and make the legend harder to read.
The bottom panel shows macOS system memory state:
wired: RAM that macOS cannot page out. On Apple Silicon, GPU/driver/kernel allocations may contribute here.compressor: physical RAM occupied by macOS's compressed memory store.swap used: memory that spilled to disk.free: immediately unused physical memory.free %: the system-wide free-memory percentage reported bymemory_pressure; this uses the right-side percentage scale.
Red shaded bands mark samples where memory_pressure reported free memory below 30%. These are useful as visual markers for critical intervals.
Useful questions to ask when reading the graph:
- Did a new model runner start before the previous model runner exited?
- Did
ollama psor/api/psshow multiple models loaded at the same time? - Did free memory drop close to zero while a large runner was active?
- Did wired memory rise sharply during model load?
- Did compressor memory rise after a low-free-memory interval?
- Did swap usage grow or remain high?
A common pattern worth reporting is:
At <time>, runner PID <pid> for <model A> was still alive while runner PID <pid> for <model B> started.
During the same interval, memory_pressure reported <N>% free memory, wired memory was <X> GiB, compressor was <Y> GiB, and swap used was <Z> GiB.
ollama ps reported <model list and sizes>.
In the process trace:
rss_kb: resident set size in KiB. This is the amount of the process's mapped memory currently resident in physical RAM. It is useful for identifying which process is currently large, but it can include shared/mapped pages.vsz_kb: virtual size in KiB. This is virtual address space, not physical RAM. For Ollama/MLX this can be very large and should not be read as actual memory pressure.command: full command line. For runners, this includes the model name.
In the system trace:
vm_stat: raw macOS virtual memory counters.memory_pressure: macOS's pressure-oriented memory summary.vm.swapusage: total, used, and free swap.ollama ps: human-readable list of models Ollama currently considers loaded./api/ps: JSON version of the loaded-model state, including model size andsize_vram.
- Do not sum process RSS and treat it as exact physical memory. Some pages can be shared or memory-mapped.
vsz_kbis virtual address space, not real RAM usage.compressoris the size of the compressed memory store in physical RAM, not the original uncompressed logical size.ollama psmodel size and process RSS are related but not identical views of memory usage.- Activity Monitor is useful for visual confirmation, but the trace files are better for sharing because they preserve timestamps and process identity.