What does an LLM reach for when you say `date`?

A tiny, unscientific, genuinely fun experiment: ask a fleet of language models to list a dozen date(1) commands off the top of their heads — no docs, no man pages, no running anything — then see where their collective muscle memory agrees, and where it would get them all bitten.

Note from Amy

Claude Opus 4.8 wrote all of this doc except this block you're reading now. I trimmed some content about some bugs it found in kaish. The prompt it used is almost verbatim what I asked Opus first. I started the conversation from a different repo so it wouldn't jump to tasking on kaish, then asked it to run a quick survey.

The setup

One prompt, sent cold to five models:

Off the top of your head — no docs, no grounding — list a dozen date shell commands that come to mind right away. Your go-to invocations. End with one sentence on what your list reveals about your own habits.

The panel: Opus 4.8 (also orchestrating), Sonnet 4.6, Haiku 4.5, Gemini 3.1 Pro, and DeepSeek V4-Pro. Five independent samples of "what date is" in a model's weights. (We go down to the lite tier later — four more models.)

The universal core — every single model

date            # what time is it
date -u         # UTC reflex
date +%s        # epoch seconds

These three are invariant across every model from a flagship down to Haiku. ISO output (-I / +%Y-%m-%d / %F) is a near-universal fourth. This is the lizard-brain of date.

Where they diverge

Example command	what it does	Opus	Haiku	Sonnet	Gemini	DeepSeek
`date -d @1700000000`	epoch → human-readable	✓	—	✓	✓	✓
`date +%s%N`	epoch with nanoseconds	✓	—	—	—	—
`date -r file.log`	a file's last-modified time	✓	—	—	✓	—
`date +%Y%m%d_%H%M%S`	sortable stamp for filenames	✓	—	✓	—	—
`date -d "next friday"`	natural-language date math	✓	—	—	✓	✓
`date +%A` / `date +%B`	weekday / month name	—	✓	—	—	✓
`date -Iseconds`	ISO 8601 down to the second	—	—	—	✓	✓
`date -R`	RFC 2822 (email/HTTP)	—	✓	—	—	✓

A few personalities fall out:

Haiku is the most cautious and least worldly — no epoch decoding, no nanos, no mtime. Its one tell: it listed TZ=UTC date, functionally redundant with the date -u it had already listed. Smaller model, narrower repertoire.
Sonnet is the log-and-backup persona — filename stamps and date→epoch round-trips dominate. It also gave the sharpest self-diagnosis of the GNU/BSD breakage it was about to walk into.
Opus had the widest spread — the only one to surface BSD date -r file and nanoseconds and next friday, and the only one whose self-note named the BSD command (-v) it failed to list.
Gemini reads like an ops runbook: -Iseconds, -r filename, next Friday 09:00. Most "API timestamp" framed.
DeepSeek is the only one leaning human/calendar — %A, %B for names — alongside the machine stuff. Its visible reasoning trace even shows it deciding to stick to GNU "because that's typical" and to confess locale-avoidance bias.

The shared blind spot

Here's the kicker. Every model — unprompted — confessed a GNU/Linux bias. And not one of them corrected for it. Not a single list led with BSD/macOS. The most-loved trick across the entire fleet is GNU -d relative date math (date -d "2 weeks ago") — which faceplants on a fresh macOS box, where -d means something else entirely and the date math is -v.

Five independently-sampled models, one shared muscle memory, one shared place they'd all get bitten. It's a clean little demonstration of monoculture in the weights: confident, convergent, and quietly platform-wrong in exactly the same spot. (Spoiler: the four lite-tier models below don't break the pattern — they tighten it.)

We went down-market

The first panel was all flagships. So we re-ran the exact prompt on the lite tier: Gemini 3.1 flash-lite, Gemma 4 31B, Gemma 4 26B-A4B (the genuinely small one — 26B MoE, ~4B active), and DeepSeek V4-flash.

Example command	what it does	flash-lite	Gemma 4 26B-A4B	Gemma 4 31B	V4-flash
`date` · `date -u` · `date +%s`	the universal core	✓	✓	✓	✓
`date -d @1700000000`	epoch → human-readable	✓	✓	✓	✓
`date -d "2 weeks ago"`	natural-language date math	✓	✓	✓	✓
`date +%Y%m%d_%H%M%S`	sortable stamp for filenames	✓	✓	✓	✓
`date -I` / `date --iso-8601=seconds`	ISO 8601	✓	—	—	✓
`date +%A` / `date +%Z`	weekday / timezone name	✓	✓	—	—
`date -r file.log`	a file's last-modified time	—	—	✓	—

Two things fell out, and both cut against the intuition that smaller = worse:

There's no quality cliff, and the monoculture gets tighter going down-market. Filename-stamping was 2/6 among the flagships; it's 4/4 here. -d @N epoch decode is 4/4. A 4-billion-active-parameter Gemma produced a list indistinguishable in quality from Gemini Pro — and reached for %Z / %A %B human-name formatting the big models mostly skipped. It's the well-trodden-task floor: capability tier barely predicts the answer when the task is this common.

The blind spot survives all the way down — and one model got caught red-handed. DeepSeek V4-flash's visible reasoning trace literally reads "I'll focus on GNU date since that's most common, but note BSD/macOS differences where they diverge" — and then, like every model before it, it didn't. It even nested date -d "$(date +%Y-%m-01) -1 day" (last-day-of-previous-month, a real sysadmin idiom): confident, fluent, and GNU-only. The monoculture isn't a flagship artifact or a small-model artifact. It's just the weights, top to bottom.

Methodology footnote: "without grounding" is doing real work here — each of the nine models answered from weights alone, no tool calls, no retrieval. The whole run took a single session. Reproducible with any panel of models you can prompt cold.

tobert/date-fleet-research.md

Select an option

No results found

Select an option

No results found

What does an LLM reach for when you say `date`?

Note from Amy

The setup

The universal core — every single model

Where they diverge

The shared blind spot

We went down-market

tobert/date-fleet-research.md

What does an LLM reach for when you say date?

Note from Amy

The setup

The universal core — every single model

Where they diverge

The shared blind spot

We went down-market

What does an LLM reach for when you say `date`?