Skip to content

Instantly share code, notes, and snippets.

@pd95
Created May 14, 2026 09:25
Show Gist options
  • Select an option

  • Save pd95/d29dde0bc7c699e1ee02d09e70fddda5 to your computer and use it in GitHub Desktop.

Select an option

Save pd95/d29dde0bc7c699e1ee02d09e70fddda5 to your computer and use it in GitHub Desktop.
Ollama macOS Memory Trace Guide

Ollama macOS Memory Trace Guide

This directory contains a small tracing workflow for investigating Ollama memory behavior on macOS, especially on Apple Silicon machines with unified memory.

The goal is to capture enough information to answer:

  • Which Ollama subprocesses were alive?
  • Which model runner each subprocess belonged to?
  • Did multiple model runners overlap?
  • What was macOS doing globally with wired memory, compressed memory, swap, and memory pressure?

The workflow produces a combined SVG graph from two trace files:

  • A per-process trace from ps.
  • A system memory trace from vm_stat, memory_pressure, sysctl vm.swapusage, ollama ps, and /api/ps.

Requirements

  • macOS
  • Ollama running locally
  • bash
  • curl
  • jq
  • Node.js for graph generation

No npm packages are required.

Files

Recommended files for each measurement run:

  • ollama-process-memtrace-YYYYMMDD-HHMMSS.tsv
    • One row per matching process sample.
    • Captures PID, PPID, RSS, VSZ, and command line.
  • system-memtrace-YYYYMMDD-HHMMSS.log
    • One block per system sample.
    • Captures macOS memory counters and Ollama's loaded-model view.
  • graphs/rss-and-system-memory.svg
    • Combined graph generated from the two trace files.
  • generate-memory-graph.js
    • Script that generates the SVG graph.

Step 1: Start Per-Process Tracing

Run this in the directory where you want to store the trace files:

mkdir -p ollama-memtrace
cd ollama-memtrace

OUT="ollama-process-memtrace-$(date +%Y%m%d-%H%M%S).tsv"

echo -e "ts\tpid\tppid\tcomm\trss_kb\tvsz_kb\tcommand" > "$OUT"

while true; do
  ts=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

  ps -axo pid=,ppid=,comm=,rss=,vsz=,command= \
    | awk -v ts="$ts" '
      /[o]llama|[m]lx|[r]unner/ {
        pid=$1; ppid=$2; comm=$3; rss=$4; vsz=$5;
        $1=$2=$3=$4=$5="";
        sub(/^ +/, "", $0);
        print ts "\t" pid "\t" ppid "\t" comm "\t" rss "\t" vsz "\t" $0
      }
    ' >> "$OUT"

  sleep 1
done

Leave this running while you reproduce the memory behavior.

Step 2: Start System Memory Tracing

Run this in a second terminal:

cd ollama-memtrace

OUT="system-memtrace-$(date +%Y%m%d-%H%M%S).log"

while true; do
  {
    echo "===== $(date -u +"%Y-%m-%dT%H:%M:%SZ") ====="
    vm_stat
    memory_pressure
    sysctl vm.swapusage
    ollama ps
    curl -s http://127.0.0.1:11434/api/ps | jq .
    echo
  } >> "$OUT"

  sleep 5
done

Leave this running during the same reproduction window.

Step 3: Reproduce the Behavior

In a third terminal, run the model operations you want to investigate.

For example:

ollama run <model> --verbose 'give me a definition for strategy and for tactics. Provide at least 3 examples to differentiate the concepts.'

If you are investigating model switching or multiple loaded models, run several models sequentially and keep the tracing commands active until the models have either unloaded or the slow/critical behavior has occurred.

If you use the Ollama macOS app, you do not need to start ollama serve manually. The app launches and manages the serve process; the traces above still capture the serve process and any model runner subprocesses.

When finished, stop both tracing loops with Ctrl-C.

Step 4: Generate the Graph

Copy generate-memory-graph.js into the trace directory if it is not already there, then run:

mkdir -p graphs

node generate-memory-graph.js \
  ollama-process-memtrace-YYYYMMDD-HHMMSS.tsv \
  system-memtrace-YYYYMMDD-HHMMSS.log \
  graphs/rss-and-system-memory.svg

Open graphs/rss-and-system-memory.svg in a browser.

Understanding the Graph

The graph has two panels.

The top panel shows per-process RSS:

  • Each line is an Ollama-related process.
  • Model runners are identified by command lines containing runner --mlx-engine --model ....
  • Short-lived placeholder entries shown by ps as (ollama) are omitted because they are usually only a few MiB and make the legend harder to read.

The bottom panel shows macOS system memory state:

  • wired: RAM that macOS cannot page out. On Apple Silicon, GPU/driver/kernel allocations may contribute here.
  • compressor: physical RAM occupied by macOS's compressed memory store.
  • swap used: memory that spilled to disk.
  • free: immediately unused physical memory.
  • free %: the system-wide free-memory percentage reported by memory_pressure; this uses the right-side percentage scale.

Red shaded bands mark samples where memory_pressure reported free memory below 30%. These are useful as visual markers for critical intervals.

Interpreting Results

Useful questions to ask when reading the graph:

  • Did a new model runner start before the previous model runner exited?
  • Did ollama ps or /api/ps show multiple models loaded at the same time?
  • Did free memory drop close to zero while a large runner was active?
  • Did wired memory rise sharply during model load?
  • Did compressor memory rise after a low-free-memory interval?
  • Did swap usage grow or remain high?

A common pattern worth reporting is:

At <time>, runner PID <pid> for <model A> was still alive while runner PID <pid> for <model B> started.
During the same interval, memory_pressure reported <N>% free memory, wired memory was <X> GiB, compressor was <Y> GiB, and swap used was <Z> GiB.
ollama ps reported <model list and sizes>.

Column Meanings

In the process trace:

  • rss_kb: resident set size in KiB. This is the amount of the process's mapped memory currently resident in physical RAM. It is useful for identifying which process is currently large, but it can include shared/mapped pages.
  • vsz_kb: virtual size in KiB. This is virtual address space, not physical RAM. For Ollama/MLX this can be very large and should not be read as actual memory pressure.
  • command: full command line. For runners, this includes the model name.

In the system trace:

  • vm_stat: raw macOS virtual memory counters.
  • memory_pressure: macOS's pressure-oriented memory summary.
  • vm.swapusage: total, used, and free swap.
  • ollama ps: human-readable list of models Ollama currently considers loaded.
  • /api/ps: JSON version of the loaded-model state, including model size and size_vram.

Caveats

  • Do not sum process RSS and treat it as exact physical memory. Some pages can be shared or memory-mapped.
  • vsz_kb is virtual address space, not real RAM usage.
  • compressor is the size of the compressed memory store in physical RAM, not the original uncompressed logical size.
  • ollama ps model size and process RSS are related but not identical views of memory usage.
  • Activity Monitor is useful for visual confirmation, but the trace files are better for sharing because they preserve timestamps and process identity.
#!/usr/bin/env node
const fs = require("fs");
const path = require("path");
const procPath = process.argv[2] || "ollama-process-memtrace.tsv";
const sysPath = process.argv[3] || "system-memtrace.log";
const outPath = process.argv[4] || "rss-and-system-memory.svg";
const pageSize = 16384;
function readProcessTrace(filename) {
return fs
.readFileSync(filename, "utf8")
.trim()
.split(/\n/)
.slice(1)
.map((line) => {
const parts = line.split("\t");
const [ts, pid, , , rssKB] = parts;
return {
ts,
t: Date.parse(ts),
pid,
rss: Number(rssKB) / 1024 / 1024,
command: parts.slice(6).join("\t"),
};
})
.filter((row) => Number.isFinite(row.t) && Number.isFinite(row.rss));
}
function readSystemTrace(filename) {
const text = fs.readFileSync(filename, "utf8");
return text
.split(/^===== /m)
.filter(Boolean)
.map((block) => {
const ts = (block.match(/(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)/) || [])[1];
if (!ts) return null;
const pages = (regex) => {
const match = block.match(regex);
return match ? Number(match[1].replace(/\./g, "")) : NaN;
};
const pct = block.match(/System-wide memory free percentage:\s+(\d+)%/);
const swap = block.match(/vm\.swapusage: total = ([\d.]+)M\s+used = ([\d.]+)M\s+free = ([\d.]+)M/);
return {
ts,
t: Date.parse(ts),
free: (pages(/^Pages free:\s+(\d+)\./m) * pageSize) / 2 ** 30,
wired: (pages(/^Pages wired down:\s+(\d+)\./m) * pageSize) / 2 ** 30,
compressor: (pages(/^Pages occupied by compressor:\s+(\d+)\./m) * pageSize) / 2 ** 30,
active: (pages(/^Pages active:\s+(\d+)\./m) * pageSize) / 2 ** 30,
inactive: (pages(/^Pages inactive:\s+(\d+)\./m) * pageSize) / 2 ** 30,
freePct: pct ? Number(pct[1]) : NaN,
swapUsed: swap ? Number(swap[2]) / 1024 : NaN,
};
})
.filter((row) => row && Number.isFinite(row.t));
}
function processLabel(command, pid) {
const model = command.match(/--model\s+([^\s]+)/);
if (model) return `${model[1]} pid ${pid}`;
if (command.includes(" ollama serve") || command.endsWith("ollama serve")) {
return `ollama serve pid ${pid}`;
}
const run = command.match(/ollama run\s+([^\s]+)/);
if (run) return `ollama run ${run[1]} pid ${pid}`;
if (command.startsWith("(ollama)")) return `(ollama) pid ${pid}`;
return `${command.replace("/Applications/Ollama-0.24.0.app/Contents/Resources/", "").slice(0, 70)} pid ${pid}`;
}
function escapeXML(value) {
return String(value).replace(/[&<>"]/g, (char) => ({
"&": "&amp;",
"<": "&lt;",
">": "&gt;",
"\"": "&quot;",
})[char]);
}
function fmtTime(t) {
return new Date(t).toISOString().slice(11, 19);
}
const procRows = readProcessTrace(procPath);
const sysRows = readSystemTrace(sysPath);
const byPID = new Map();
for (const row of procRows) {
if (!byPID.has(row.pid)) {
byPID.set(row.pid, {
pid: row.pid,
command: row.command,
label: processLabel(row.command, row.pid),
points: [],
max: 0,
});
}
const series = byPID.get(row.pid);
series.points.push(row);
series.max = Math.max(series.max, row.rss);
}
const procSeries = [...byPID.values()]
.filter((series) => !series.command.startsWith("(ollama)"))
.filter((series) => series.max >= 0.02 || /runner|serve|ollama run/.test(series.command))
.sort((a, b) => b.max - a.max);
const allTimes = [...procRows.map((row) => row.t), ...sysRows.map((row) => row.t)];
const minT = Math.min(...allTimes);
const maxT = Math.max(...allTimes);
const maxRSS = Math.ceil(Math.max(...procSeries.map((series) => series.max)) + 0.5);
const maxSystem = Math.ceil(
Math.max(
...sysRows
.flatMap((row) => [row.wired, row.compressor, row.free, row.swapUsed])
.filter(Number.isFinite),
) + 1,
);
const width = 1280;
const height = 860;
const marginLeft = 82;
const marginRight = 300;
const marginTop = 48;
const gap = 60;
const topHeight = 360;
const bottomHeight = 300;
const innerWidth = width - marginLeft - marginRight;
const topY = marginTop;
const bottomY = marginTop + topHeight + gap;
const x = (t) => marginLeft + ((t - minT) / (maxT - minT)) * innerWidth;
const yTop = (value) => topY + topHeight - (value / maxRSS) * topHeight;
const yBottom = (value) => bottomY + bottomHeight - (value / maxSystem) * bottomHeight;
const yPct = (value) => bottomY + bottomHeight - (value / 100) * bottomHeight;
const processColors = [
"#d1495b",
"#00798c",
"#edae49",
"#30638e",
"#7a5195",
"#588157",
"#bc6c25",
"#6d597a",
"#2a9d8f",
"#8d99ae",
];
const systemColors = {
wired: "#b23a48",
compressor: "#7a5195",
free: "#2a9d8f",
swapUsed: "#bc6c25",
freePct: "#2f3e46",
};
function polyline(points, field, y) {
return points
.filter((point) => Number.isFinite(point[field]))
.sort((a, b) => a.t - b.t)
.map((point) => `${x(point.t).toFixed(1)},${y(point[field]).toFixed(1)}`)
.join(" ");
}
const svg = [];
svg.push(`<svg xmlns="http://www.w3.org/2000/svg" width="${width}" height="${height}" viewBox="0 0 ${width} ${height}">`);
svg.push(`<rect width="100%" height="100%" fill="#fbfbf8"/>`);
svg.push(`<text x="${marginLeft}" y="26" font-family="Arial, sans-serif" font-size="18" font-weight="700" fill="#1f2933">Ollama RSS and macOS memory state</text>`);
svg.push(`<text x="${marginLeft}" y="43" font-family="Arial, sans-serif" font-size="12" fill="#5b6472">${fmtTime(minT)} to ${fmtTime(maxT)} UTC. Shaded bands mark memory_pressure free percentage below 30%.</text>`);
for (let i = 0; i < sysRows.length; i++) {
const point = sysRows[i];
if (!Number.isFinite(point.freePct) || point.freePct >= 30) continue;
const previous = sysRows[i - 1]?.t ?? point.t - 2500;
const next = sysRows[i + 1]?.t ?? point.t + 2500;
const start = Math.max(minT, (previous + point.t) / 2);
const end = Math.min(maxT, (point.t + next) / 2);
svg.push(`<rect x="${x(start).toFixed(1)}" y="${topY}" width="${Math.max(1, x(end) - x(start)).toFixed(1)}" height="${topHeight + gap + bottomHeight}" fill="#f5c2c7" opacity="0.22"/>`);
}
function drawAxes(y0, h, max, title, showPercentAxis = false) {
for (let value = 0; value <= max; value += max > 20 ? 5 : 2) {
const yy = y0 + h - (value / max) * h;
svg.push(`<line x1="${marginLeft}" y1="${yy.toFixed(1)}" x2="${marginLeft + innerWidth}" y2="${yy.toFixed(1)}" stroke="#e1e4e8" stroke-width="1"/>`);
svg.push(`<text x="${marginLeft - 10}" y="${(yy + 4).toFixed(1)}" text-anchor="end" font-family="Arial, sans-serif" font-size="11" fill="#5b6472">${value}</text>`);
}
const startTick = Math.ceil(minT / 60000) * 60000;
for (let t = startTick; t <= maxT; t += 60000) {
svg.push(`<line x1="${x(t).toFixed(1)}" y1="${y0}" x2="${x(t).toFixed(1)}" y2="${y0 + h}" stroke="#edf0f2" stroke-width="1"/>`);
}
svg.push(`<line x1="${marginLeft}" y1="${y0}" x2="${marginLeft}" y2="${y0 + h}" stroke="#2f3742"/>`);
svg.push(`<line x1="${marginLeft}" y1="${y0 + h}" x2="${marginLeft + innerWidth}" y2="${y0 + h}" stroke="#2f3742"/>`);
svg.push(`<text x="20" y="${y0 + h / 2}" transform="rotate(-90 20 ${y0 + h / 2})" text-anchor="middle" font-family="Arial, sans-serif" font-size="12" fill="#2f3742">${title}</text>`);
if (showPercentAxis) {
svg.push(`<text x="${marginLeft + innerWidth + 8}" y="${y0 + 4}" font-family="Arial, sans-serif" font-size="10" fill="#697386">100%</text>`);
svg.push(`<text x="${marginLeft + innerWidth + 8}" y="${y0 + h + 4}" font-family="Arial, sans-serif" font-size="10" fill="#697386">0%</text>`);
}
}
drawAxes(topY, topHeight, maxRSS, "Process RSS GiB");
drawAxes(bottomY, bottomHeight, maxSystem, "System GiB", true);
const startTick = Math.ceil(minT / 60000) * 60000;
for (let t = startTick; t <= maxT; t += 60000) {
svg.push(`<text x="${x(t).toFixed(1)}" y="${bottomY + bottomHeight + 22}" text-anchor="middle" font-family="Arial, sans-serif" font-size="11" fill="#5b6472">${fmtTime(t).slice(3)}</text>`);
}
procSeries.forEach((series, index) => {
const color = processColors[index % processColors.length];
const points = series.points
.sort((a, b) => a.t - b.t)
.map((point) => `${x(point.t).toFixed(1)},${yTop(point.rss).toFixed(1)}`)
.join(" ");
const isRunner = series.command.includes("runner --mlx-engine");
svg.push(`<polyline points="${points}" fill="none" stroke="${color}" stroke-width="${isRunner ? 2.6 : 1.4}" stroke-linejoin="round" stroke-linecap="round" opacity="${isRunner ? 0.95 : 0.65}"/>`);
});
for (const [field] of [
["wired"],
["compressor"],
["swapUsed"],
["free"],
]) {
svg.push(`<polyline points="${polyline(sysRows, field, yBottom)}" fill="none" stroke="${systemColors[field]}" stroke-width="2.4" stroke-linejoin="round" stroke-linecap="round"/>`);
}
svg.push(`<polyline points="${polyline(sysRows, "freePct", yPct)}" fill="none" stroke="${systemColors.freePct}" stroke-width="1.8" stroke-dasharray="5 4" stroke-linejoin="round" stroke-linecap="round"/>`);
svg.push(`<line x1="${marginLeft}" y1="${yPct(30).toFixed(1)}" x2="${marginLeft + innerWidth}" y2="${yPct(30).toFixed(1)}" stroke="#9b2226" stroke-width="1" stroke-dasharray="3 4" opacity="0.7"/>`);
svg.push(`<text x="${marginLeft + innerWidth + 8}" y="${yPct(30) + 4}" font-family="Arial, sans-serif" font-size="10" fill="#9b2226">30%</text>`);
const legendX = marginLeft + innerWidth + 24;
svg.push(`<text x="${legendX}" y="${topY}" font-family="Arial, sans-serif" font-size="13" font-weight="700" fill="#1f2933">Processes</text>`);
procSeries.forEach((series, index) => {
const y = topY + 22 + index * 26;
if (y > topY + topHeight - 6) return;
const color = processColors[index % processColors.length];
svg.push(`<line x1="${legendX}" y1="${y}" x2="${legendX + 24}" y2="${y}" stroke="${color}" stroke-width="3"/>`);
svg.push(`<text x="${legendX + 32}" y="${y - 4}" font-family="Arial, sans-serif" font-size="10.5" fill="#1f2933">${escapeXML(series.label)}</text>`);
svg.push(`<text x="${legendX + 32}" y="${y + 9}" font-family="Arial, sans-serif" font-size="9.5" fill="#697386">max ${series.max.toFixed(2)} GiB</text>`);
});
svg.push(`<text x="${legendX}" y="${bottomY}" font-family="Arial, sans-serif" font-size="13" font-weight="700" fill="#1f2933">System memory</text>`);
[
["wired", "wired"],
["compressor", "compressor"],
["swapUsed", "swap used"],
["free", "free"],
["freePct", "free % (right scale)"],
].forEach(([field, name], index) => {
const y = bottomY + 24 + index * 24;
svg.push(`<line x1="${legendX}" y1="${y}" x2="${legendX + 24}" y2="${y}" stroke="${systemColors[field]}" stroke-width="${field === "freePct" ? 1.8 : 2.8}" ${field === "freePct" ? 'stroke-dasharray="5 4"' : ""}/>`);
svg.push(`<text x="${legendX + 32}" y="${y + 4}" font-family="Arial, sans-serif" font-size="11" fill="#1f2933">${name}</text>`);
});
svg.push(`<rect x="${legendX}" y="${bottomY + 154}" width="24" height="12" fill="#f5c2c7" opacity="0.35"/>`);
svg.push(`<text x="${legendX + 32}" y="${bottomY + 164}" font-family="Arial, sans-serif" font-size="11" fill="#1f2933">free % below 30</text>`);
svg.push("</svg>");
fs.mkdirSync(path.dirname(outPath), { recursive: true });
fs.writeFileSync(outPath, svg.join("\n"));
console.log(outPath);
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment