Reverse-engineered specification of the recipe database format used by Micro Cookbook (Pinpoint Publishing, c. 1988–1994) on MS-DOS. This is sufficient to (a) reimplement the recipe extraction in any language, and (b) understand the on-disk structures well enough to recreate the original program's behaviour.
Status of knowledge: the record/field encoding is fully decoded and
validated against the program's own print-out. The 32 KB header index
region, SCREEN.TXT, and CONTROL.TXT are not decoded (not required for
extraction); their roles are described where known. Anything marked (inferred)
was not exhaustively verified.
A Micro Cookbook installation directory (and each recipe sub-database) contains:
| File | Type | Role |
|---|---|---|
RECIPE.TXT |
binary | The recipe database. One per database directory. |
SCREEN.TXT |
binary | Per-database UI screens + data-entry form layout. Field tags in RECIPE.TXT correspond to form-field positions defined here. (not decoded) |
CONTROL.TXT |
binary | Engine control/state. (not decoded) |
PROFILE.TXT |
text | Device/path configuration (decoded; see §13). |
PRINTREC.TXT |
text | A human-readable recipe print-out the program emitted. Not part of the DB; invaluable as ground truth. |
COOKBOOK.EXE, MAINT.EXE, CREATE.EXE, CONFIG.EXE |
DOS MZ executables | The program and its utilities. |
The
.TXTextension onRECIPE.TXT/SCREEN.TXT/CONTROL.TXTis misleading — they are binary. OnlyPROFILE.TXTandPRINTREC.TXTare real text (ASCII, CRLF line endings, sometimes terminated by0x1A/^Z).
All recipe values are 7-bit ASCII. No code-page-dependent or high-bit bytes
were observed in recipe text. Bytes 0x00 and 0xFF are reserved structural
markers (§4) and never appear inside a value. Fractions and quantities are
stored as literal text ("1/2", "3 1/2", "1 Lb.", "8 oz.").
+--------------------------------------------------+ offset 0
| HEADER / INDEX REGION (~32 KB, binary) | sort orders, free-list,
| *(internal structure NOT decoded)* | and the program's
| | RECIPE / INGREDIENT /
+--------------------------------------------------+ CLASSIFICATION indexes
| RECORD AREA | (inferred)
| record, record, record, ... |
| interspersed with deleted-record "slack" |
+--------------------------------------------------+ EOF
- The record area begins after the index region. In the five sample databases
the first record started at
0x8000(32 KB) in four of them and at0x14C00in the largest (RECIPE3). Do not hard-code this — locate records by the title signature (§9) instead. - The index region is partly or entirely zero-filled in some databases and full of binary structures in others. It is not needed to read the recipes.
The record area is a stream of tag–length–value (TLV) fields:
field := tag(1 byte) length(1 byte) value(`length` bytes, ASCII)
Two byte values are structural and replace a field where they occur:
| Byte | Meaning |
|---|---|
0xFF |
Screen-page break. Field tags RESET to low values on the page that follows (see §6, §7). |
0x00 |
Padding / end of meaningful data within a slot (also appears in slack). |
There is no record-terminator byte and no record-length prefix. Record boundaries must be inferred (§8).
A record begins with tag 0x01 (the title) and uses ascending tags. On the
first page of a record the tags carry these meanings:
| Tag | Meaning | Notes |
|---|---|---|
0x01 |
Title | Required; first field of a record. |
0x02 |
Servings / yield | Short; often numeric ("4", "06", "2dz", "112"). May be absent. |
0x03–0x06 |
Classifications | 0–4 free-text categories/keywords ("appetizer", "Soup", "American"). A value beginning From: is an attribution/source, not a category. |
0x07 and up |
Ingredient grid | See §6. The grid always begins at 0x07 on the first page. |
| (after the grid) | Date | A value matching dd-dd-dddd (e.g. 09-03-1988) is the entry date. |
Tags 0x02–0x06 may be sparse (e.g. a title with one classification jumps
0x01 → 0x03 → 0x07). Treat any missing tag as an empty field.
Ingredients are a four-column table: quantity, measure, preparation, ingredient-name. Each cell is one TLV field; the cell's position is derived from its tag:
column = (tag - base) mod 4 # 0=quantity 1=measure 2=preparation 3=name
row = (tag - base) div 4
base = 0x07on the first page (because tags0x01–0x06are the header).base = 0x01on every continuation page (after a0xFFpage break).
Empty cells simply skip a tag. Example header-page row mapping:
0x07 -> r0c0 (qty) 0x08 -> r0c1 (measure) 0x09 -> r0c2 (prep) 0x0A -> r0c3 (name)
0x0B -> r1c0 0x0C -> r1c1 (0x0D absent) 0x0E -> r1c3
When a recipe has more ingredients than fit on one screen, it continues after a
0xFF page break with tags restarting at 0x01 (base = 0x01). The row index
must keep counting from where the previous page left off — it must not
restart at 0. Maintain a running row_offset:
absolute_row = row_offset + ((tag - base) div 4)
If you reset rows per page, continuation-page ingredients overwrite the first ingredient of the record (silent data loss + reordering). This was the single most damaging bug during development.
After the ingredient grid (typically following one or two 0xFF breaks) come
the directions: a sequence of TLV text fields, each holding one screen line
(typically 30–75 chars). Reconstruct the body by concatenating the values in
order (then re-wrap to taste).
- Direction-field tags are not semantically meaningful — they are line/screen
positions and may either reset to low values after a
0xFFor continue upward across a break. Do not interpret them; just keep textual order. - A record may have multiple direction pages; a single
0xFFbetween them is normal. Two consecutive0xFF 0xFF(an empty page) was also observed.
Because 0xFF 0x01 marks both a page break (next page, tag 0x01) and
the start of the next recipe (title, tag 0x01), it cannot delimit records.
Furthermore, Micro Cookbook did not compact its files: editing/deleting a
recipe left the old bytes in place, so stale fragments ("slack") sit between
live records, and the same recipe can appear multiple times (truncated copies).
The reference extractor therefore locates records by a title signature and discards slack by stopping on the first invalid byte:
A position is treated as a recipe start when all hold:
- byte =
0x01; next byteLwith2 ≤ L ≤ 40; - the
L-byte value is printable ASCII, contains a letter, and begins with a letter or one of" ' ( #(i.e. looks like a name, not a quantity); - the field immediately after the title has tag
0x02–0x07; - scanning forward (skipping
0xFF) you reach a tag-0x07field within ~14 fields whose value is short (≤ 20 chars) — the first ingredient cell; - no field longer than ~24 chars appears before that
0x07. Condition 5 is what rejects a directions line that merely happens to start at tag0x01and reach tag0x07(its 7th line) — its early fields are long prose.
When parsing a record's fields, stop the record (and skip to the next detected
title) on the first sign of slack: a 0x00 byte, a length that would overrun
the next title, or a value containing any non-printable byte.
On continuation pages you must decide whether the (short-tagged) fields are more ingredients or are directions. Heuristic used by the reference implementation:
A page is directions if any field is longer than 30 chars, or any field contains an internal sentence boundary
[a-z]{2}[.;]\s(lowercase, then./;, then whitespace). Otherwise it is ingredients.
The sentence-boundary test catches terse one-line directions such as
"Combine the ingredients. Serve" (30 chars) that a length-only rule misses,
while never matching short Title-Case ingredient cells. The classification is
sticky: once directions begin, the rest of the record is directions.
Raw bytes at offset 0x8000 of the root RECIPE.TXT (. = control byte):
0x8000: 01 0B "avocado dip"
02 02 "04"
03 09 "appetizer"
07 01 "2" 08 05 "large" 09 04 "ripe" 0A 07 "avocado"
0B 01 "3" 0C 03 "tbl" 0E 0B "lemon juice"
0F 01 "1" 10 05 "small" 11 03 "can" 12 0C "chili pepper"
13 03 "1/2" 14 03 "tsp" 16 04 "salt"
17 01 "1" 18 03 "tbl" 19 06 "minced" 1A 05 "onion"
1C 04 "dash" 1E 0D "tabasco sauce"
FF FF <- end ingredients, begin directions
03 20 " Mash peeled avocados well or use"
04 23 "blender. Add remaining ingredients"
05 0F "and blend well."
07 22 "Refrigerate 1 hour before serving."
FF <- record ends; next byte 01 = next title
Decoded:
Title: avocado dip
Servings: 04
Classification: appetizer
Ingredients (qty / measure / prep / name):
2 large ripe avocado
3 tbl lemon juice
1 small can chili pepper
1/2 tsp salt
1 tbl minced onion
dash tabasco sauce
Directions:
Mash peeled avocados well or use blender. Add remaining ingredients
and blend well. Refrigerate 1 hour before serving.
(Here the ingredient grid columns map via base=0x07; e.g. 0x11="can" →
(0x11-7) mod 4 = 2 = preparation, (0x11-7) div 4 = 1 = row 1.)
PAGE 0 (base 0x07): title, classification "Candy", source "From: Carol A.
Williams", ingredient rows 0–5 (Butter … Chocolate Chips).
0xFF
PAGE 1 (base 0x01): one more ingredient row -> row_offset(6) + 0 = ingredient 7
(1/4 Cake Parafin Wax).
0xFF
PAGE 2 (directions): "Blend butter, peanut butter & powdered sugar together.
Mix in rice Krispy's …" (verbatim across several line-fields).
This record is the validation oracle: the program's own PRINTREC.TXT shows
exactly these 7 ingredients, in this order, with these columns, and this
directions text.
function extract(bytes):
starts = []
for i in 0 .. len(bytes)-4:
if is_title_signature(bytes, i): # §8.1
starts.append(i); skip i past the title value
records = []
for k in 0 .. len(starts)-1:
end = starts[k+1] if k+1 < len(starts) else len(bytes)
rec = parse_record(bytes, starts[k], end)
if rec: records.append(rec)
return records
function parse_record(bytes, start, end):
pages = [[]]; pos = start
while pos < end:
b = bytes[pos]
if b == 0xFF: pages.append([]); pos += 1; continue # page break
if b == 0x00: break # slack
L = bytes[pos+1]
if pos+2+L > end: break # overrun -> slack
v = bytes[pos+2 .. pos+2+L]
if v has any non-printable byte: break # slack
pages[-1].append({tag:b, val:v}); pos += 2+L
if pages[0] empty or pages[0][0].tag != 0x01: return null
title = pages[0][0].val
grid = []; row_offset = 0; in_dir = false
# page 0: header metadata + ingredients (base 0x07)
pmax = -1
for f in pages[0][1:]:
if 0x02 <= f.tag <= 0x06:
classify as servings / source(From:) / classification
else if f.tag >= 0x07:
if is_date(f.val): record.date = f.val; continue
col = (f.tag-7) mod 4; lr = (f.tag-7) div 4
grid.append((lr, col, f.val)); pmax = max(pmax, lr)
if pmax >= 0: row_offset = pmax + 1
# later pages: ingredients (base 0x01) or directions
for p in pages[1:]:
if page_is_dir(p): in_dir = true # §9
if in_dir:
append every f.val to directions
else:
lmax = -1
for f in p:
if is_date(f.val): continue
col = (f.tag-1) mod 4; lr = (f.tag-1) div 4
grid.append((row_offset+lr, col, f.val)); lmax = max(lmax, lr)
if lmax >= 0: row_offset += lmax + 1
rows = assemble grid cells into [qty, measure, prep, name] by (row, col)
drop empty rows
return {title, servings, classes, source, date, ingredients: rows, directions}
is_date(v) = v matches ^\d{1,2}-\d{1,2}-\d{2,4}$
page_is_dir(p) = any f in p with len(f.val) > 30 OR f.val matches [a-z]{2}[.;]\s
Device/path configuration in a simple keyword grammar:
define_device,disk
path,C:\COOKBOOK
path,C:\COOKBOOK\RECIPE1
...
path,A:\
end_define
Lists the database directories the program searches. Useful for understanding
how the five RECIPE*/ databases were registered.
A formatted recipe print-out. Its column layout reveals the program's canonical presentation and confirms field semantics:
PEANUT BUTTER BALLS
Serving Size :
Keywords : Candy
From: Carol A. Williams
Qty Measurement Preparation Ingredient
--- ----------- ----------- ----------
1/2 Cup Butter
...
→ Columns are Qty / Measurement / Preparation / Ingredient; header carries
Serving Size, Keywords, and From: — exactly the tag 0x01–0x06
fields. This is the layout to mirror when recreating the program's output.
SCREEN.TXT (24 KB per DB) contains the program's UI text (menus such as
"MICRO COOKBOOK", "RECIPE INDEX", "INGREDIENT INDEX", "CLASSIFICATION INDEX",
function-key help) and, (inferred), the data-entry form definitions that map
field tags to on-screen positions. CONTROL.TXT (24 KB) is engine state. Neither
is required to read recipes.
- The 32 KB header index region. Internal structure undecoded. To recreate the program's browse by recipe / ingredient / classification features you must decode (or simply rebuild) these indexes. Rebuilding from the parsed records is the pragmatic route; matching the original byte layout is the open reverse-engineering task. The region opens with what looks like an 8-byte signature/hash followed by packed structures.
SCREEN.TXTform layout. Decoding it would let you map field tags to labelled screen positions authoritatively (rather than via the §5/§6 rules) and reproduce the data-entry UI.- Per-database variation. Field-tag meanings are stable, but exact screen
layouts differ slightly between databases (each
RECIPE*/has its ownSCREEN.TXT). The §5/§6 rules held across all five sampled databases.
RECORD AREA = stream of TLV fields after a ~32 KB index region (offset varies).
FIELD = tag(1) len(1) value(len ASCII).
0xFF = page break; tags reset (ingredient base 0x07 -> 0x01).
0x00 = padding/slack.
TAGS p0 : 0x01 title | 0x02 servings | 0x03-06 classes/From: | 0x07+ grid | date
GRID : col=(tag-base)%4 [qty,measure,prep,name]; row=(tag-base)/4; rows
accumulate globally across pages (row_offset).
DIRECTIONS : prose TLV fields after the grid; tags not meaningful; concat in order.
RECORDS : not delimited -> find by title signature; stop on first slack byte.
DEDUP/SLACK : files are never compacted -> expect duplicates & truncated remnants.