Skip to content

Instantly share code, notes, and snippets.

@tobert
Last active June 14, 2026 13:31
Show Gist options
  • Select an option

  • Save tobert/da68b602b9d14904a310e99f34d4e018 to your computer and use it in GitHub Desktop.

Select an option

Save tobert/da68b602b9d14904a310e99f34d4e018 to your computer and use it in GitHub Desktop.
What does an LLM reach for when you say `date`? — a small cross-model muscle-memory survey

What does an LLM reach for when you say date?

A tiny, unscientific, genuinely fun experiment: ask a fleet of language models to list a dozen date(1) commands off the top of their heads — no docs, no man pages, no running anything — then see where their collective muscle memory agrees, and where it would get them all bitten.

Note from Amy

Claude Opus 4.8 wrote all of this doc except this block you're reading now. I trimmed some content about some bugs it found in kaish. The prompt it used is almost verbatim what I asked Opus first. I started the conversation from a different repo so it wouldn't jump to tasking on kaish, then asked it to run a quick survey.


The setup

One prompt, sent cold to five models:

Off the top of your head — no docs, no grounding — list a dozen date shell commands that come to mind right away. Your go-to invocations. End with one sentence on what your list reveals about your own habits.

The panel: Opus 4.8 (also orchestrating), Sonnet 4.6, Haiku 4.5, Gemini 3.1 Pro, and DeepSeek V4-Pro. Five independent samples of "what date is" in a model's weights. (We go down to the lite tier later — four more models.)

The universal core — every single model

date            # what time is it
date -u         # UTC reflex
date +%s        # epoch seconds

These three are invariant across every model from a flagship down to Haiku. ISO output (-I / +%Y-%m-%d / %F) is a near-universal fourth. This is the lizard-brain of date.

Where they diverge

Example command what it does Opus Haiku Sonnet Gemini DeepSeek
date -d @1700000000 epoch → human-readable
date +%s%N epoch with nanoseconds
date -r file.log a file's last-modified time
date +%Y%m%d_%H%M%S sortable stamp for filenames
date -d "next friday" natural-language date math
date +%A / date +%B weekday / month name
date -Iseconds ISO 8601 down to the second
date -R RFC 2822 (email/HTTP)

A few personalities fall out:

  • Haiku is the most cautious and least worldly — no epoch decoding, no nanos, no mtime. Its one tell: it listed TZ=UTC date, functionally redundant with the date -u it had already listed. Smaller model, narrower repertoire.
  • Sonnet is the log-and-backup persona — filename stamps and date→epoch round-trips dominate. It also gave the sharpest self-diagnosis of the GNU/BSD breakage it was about to walk into.
  • Opus had the widest spread — the only one to surface BSD date -r file and nanoseconds and next friday, and the only one whose self-note named the BSD command (-v) it failed to list.
  • Gemini reads like an ops runbook: -Iseconds, -r filename, next Friday 09:00. Most "API timestamp" framed.
  • DeepSeek is the only one leaning human/calendar — %A, %B for names — alongside the machine stuff. Its visible reasoning trace even shows it deciding to stick to GNU "because that's typical" and to confess locale-avoidance bias.

The shared blind spot

Here's the kicker. Every model — unprompted — confessed a GNU/Linux bias. And not one of them corrected for it. Not a single list led with BSD/macOS. The most-loved trick across the entire fleet is GNU -d relative date math (date -d "2 weeks ago") — which faceplants on a fresh macOS box, where -d means something else entirely and the date math is -v.

Five independently-sampled models, one shared muscle memory, one shared place they'd all get bitten. It's a clean little demonstration of monoculture in the weights: confident, convergent, and quietly platform-wrong in exactly the same spot. (Spoiler: the four lite-tier models below don't break the pattern — they tighten it.)

We went down-market

The first panel was all flagships. So we re-ran the exact prompt on the lite tier: Gemini 3.1 flash-lite, Gemma 4 31B, Gemma 4 26B-A4B (the genuinely small one — 26B MoE, ~4B active), and DeepSeek V4-flash.

Example command what it does flash-lite Gemma 4 26B-A4B Gemma 4 31B V4-flash
date · date -u · date +%s the universal core
date -d @1700000000 epoch → human-readable
date -d "2 weeks ago" natural-language date math
date +%Y%m%d_%H%M%S sortable stamp for filenames
date -I / date --iso-8601=seconds ISO 8601
date +%A / date +%Z weekday / timezone name
date -r file.log a file's last-modified time

Two things fell out, and both cut against the intuition that smaller = worse:

There's no quality cliff, and the monoculture gets tighter going down-market. Filename-stamping was 2/6 among the flagships; it's 4/4 here. -d @N epoch decode is 4/4. A 4-billion-active-parameter Gemma produced a list indistinguishable in quality from Gemini Pro — and reached for %Z / %A %B human-name formatting the big models mostly skipped. It's the well-trodden-task floor: capability tier barely predicts the answer when the task is this common.

The blind spot survives all the way down — and one model got caught red-handed. DeepSeek V4-flash's visible reasoning trace literally reads "I'll focus on GNU date since that's most common, but note BSD/macOS differences where they diverge" — and then, like every model before it, it didn't. It even nested date -d "$(date +%Y-%m-01) -1 day" (last-day-of-previous-month, a real sysadmin idiom): confident, fluent, and GNU-only. The monoculture isn't a flagship artifact or a small-model artifact. It's just the weights, top to bottom.


Methodology footnote: "without grounding" is doing real work here — each of the nine models answered from weights alone, no tool calls, no retrieval. The whole run took a single session. Reproducible with any panel of models you can prompt cold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment