Skip to content

Instantly share code, notes, and snippets.

@roycewilliams
Created June 2, 2026 00:28
Show Gist options
  • Select an option

  • Save roycewilliams/c3bbaac1dbdcbde7faaf1ed96af1acd7 to your computer and use it in GitHub Desktop.

Select an option

Save roycewilliams/c3bbaac1dbdcbde7faaf1ed96af1acd7 to your computer and use it in GitHub Desktop.

Micro Cookbook RECIPE.TXT — Binary Format Reference

Reverse-engineered specification of the recipe database format used by Micro Cookbook (Pinpoint Publishing, c. 1988–1994) on MS-DOS. This is sufficient to (a) reimplement the recipe extraction in any language, and (b) understand the on-disk structures well enough to recreate the original program's behaviour.

Status of knowledge: the record/field encoding is fully decoded and validated against the program's own print-out. The 32 KB header index region, SCREEN.TXT, and CONTROL.TXT are not decoded (not required for extraction); their roles are described where known. Anything marked (inferred) was not exhaustively verified.


1. File inventory

A Micro Cookbook installation directory (and each recipe sub-database) contains:

File Type Role
RECIPE.TXT binary The recipe database. One per database directory.
SCREEN.TXT binary Per-database UI screens + data-entry form layout. Field tags in RECIPE.TXT correspond to form-field positions defined here. (not decoded)
CONTROL.TXT binary Engine control/state. (not decoded)
PROFILE.TXT text Device/path configuration (decoded; see §13).
PRINTREC.TXT text A human-readable recipe print-out the program emitted. Not part of the DB; invaluable as ground truth.
COOKBOOK.EXE, MAINT.EXE, CREATE.EXE, CONFIG.EXE DOS MZ executables The program and its utilities.

The .TXT extension on RECIPE.TXT/SCREEN.TXT/CONTROL.TXT is misleading — they are binary. Only PROFILE.TXT and PRINTREC.TXT are real text (ASCII, CRLF line endings, sometimes terminated by 0x1A/^Z).

2. Character encoding

All recipe values are 7-bit ASCII. No code-page-dependent or high-bit bytes were observed in recipe text. Bytes 0x00 and 0xFF are reserved structural markers (§4) and never appear inside a value. Fractions and quantities are stored as literal text ("1/2", "3 1/2", "1 Lb.", "8 oz.").

3. High-level layout of RECIPE.TXT

+--------------------------------------------------+ offset 0
|  HEADER / INDEX REGION  (~32 KB, binary)         |   sort orders, free-list,
|  *(internal structure NOT decoded)*              |   and the program's
|                                                  |   RECIPE / INGREDIENT /
+--------------------------------------------------+   CLASSIFICATION indexes
|  RECORD AREA                                     |   (inferred)
|    record, record, record, ...                  |
|    interspersed with deleted-record "slack"      |
+--------------------------------------------------+ EOF
  • The record area begins after the index region. In the five sample databases the first record started at 0x8000 (32 KB) in four of them and at 0x14C00 in the largest (RECIPE3). Do not hard-code this — locate records by the title signature (§9) instead.
  • The index region is partly or entirely zero-filled in some databases and full of binary structures in others. It is not needed to read the recipes.

4. Field encoding (the core grammar)

The record area is a stream of tag–length–value (TLV) fields:

field := tag(1 byte) length(1 byte) value(`length` bytes, ASCII)

Two byte values are structural and replace a field where they occur:

Byte Meaning
0xFF Screen-page break. Field tags RESET to low values on the page that follows (see §6, §7).
0x00 Padding / end of meaningful data within a slot (also appears in slack).

There is no record-terminator byte and no record-length prefix. Record boundaries must be inferred (§8).

5. Field tags within a record

A record begins with tag 0x01 (the title) and uses ascending tags. On the first page of a record the tags carry these meanings:

Tag Meaning Notes
0x01 Title Required; first field of a record.
0x02 Servings / yield Short; often numeric ("4", "06", "2dz", "112"). May be absent.
0x030x06 Classifications 0–4 free-text categories/keywords ("appetizer", "Soup", "American"). A value beginning From: is an attribution/source, not a category.
0x07 and up Ingredient grid See §6. The grid always begins at 0x07 on the first page.
(after the grid) Date A value matching dd-dd-dddd (e.g. 09-03-1988) is the entry date.

Tags 0x020x06 may be sparse (e.g. a title with one classification jumps 0x01 → 0x03 → 0x07). Treat any missing tag as an empty field.

6. The ingredient grid

Ingredients are a four-column table: quantity, measure, preparation, ingredient-name. Each cell is one TLV field; the cell's position is derived from its tag:

column = (tag - base) mod 4        # 0=quantity 1=measure 2=preparation 3=name
row    = (tag - base) div 4
  • base = 0x07 on the first page (because tags 0x010x06 are the header).
  • base = 0x01 on every continuation page (after a 0xFF page break).

Empty cells simply skip a tag. Example header-page row mapping:

0x07 -> r0c0 (qty)   0x08 -> r0c1 (measure)  0x09 -> r0c2 (prep)  0x0A -> r0c3 (name)
0x0B -> r1c0         0x0C -> r1c1            (0x0D absent)         0x0E -> r1c3

6.1 Rows accumulate GLOBALLY across pages (critical)

When a recipe has more ingredients than fit on one screen, it continues after a 0xFF page break with tags restarting at 0x01 (base = 0x01). The row index must keep counting from where the previous page left off — it must not restart at 0. Maintain a running row_offset:

absolute_row = row_offset + ((tag - base) div 4)

If you reset rows per page, continuation-page ingredients overwrite the first ingredient of the record (silent data loss + reordering). This was the single most damaging bug during development.

7. Directions

After the ingredient grid (typically following one or two 0xFF breaks) come the directions: a sequence of TLV text fields, each holding one screen line (typically 30–75 chars). Reconstruct the body by concatenating the values in order (then re-wrap to taste).

  • Direction-field tags are not semantically meaningful — they are line/screen positions and may either reset to low values after a 0xFF or continue upward across a break. Do not interpret them; just keep textual order.
  • A record may have multiple direction pages; a single 0xFF between them is normal. Two consecutive 0xFF 0xFF (an empty page) was also observed.

8. Record boundaries and "slack" (no delimiters)

Because 0xFF 0x01 marks both a page break (next page, tag 0x01) and the start of the next recipe (title, tag 0x01), it cannot delimit records. Furthermore, Micro Cookbook did not compact its files: editing/deleting a recipe left the old bytes in place, so stale fragments ("slack") sit between live records, and the same recipe can appear multiple times (truncated copies).

The reference extractor therefore locates records by a title signature and discards slack by stopping on the first invalid byte:

8.1 Title signature (start of a record)

A position is treated as a recipe start when all hold:

  1. byte = 0x01; next byte L with 2 ≤ L ≤ 40;
  2. the L-byte value is printable ASCII, contains a letter, and begins with a letter or one of " ' ( # (i.e. looks like a name, not a quantity);
  3. the field immediately after the title has tag 0x020x07;
  4. scanning forward (skipping 0xFF) you reach a tag-0x07 field within ~14 fields whose value is short (≤ 20 chars) — the first ingredient cell;
  5. no field longer than ~24 chars appears before that 0x07. Condition 5 is what rejects a directions line that merely happens to start at tag 0x01 and reach tag 0x07 (its 7th line) — its early fields are long prose.

8.2 Stopping on slack

When parsing a record's fields, stop the record (and skip to the next detected title) on the first sign of slack: a 0x00 byte, a length that would overrun the next title, or a value containing any non-printable byte.

9. Distinguishing ingredient pages from direction pages

On continuation pages you must decide whether the (short-tagged) fields are more ingredients or are directions. Heuristic used by the reference implementation:

A page is directions if any field is longer than 30 chars, or any field contains an internal sentence boundary [a-z]{2}[.;]\s (lowercase, then ./;, then whitespace). Otherwise it is ingredients.

The sentence-boundary test catches terse one-line directions such as "Combine the ingredients. Serve" (30 chars) that a length-only rule misses, while never matching short Title-Case ingredient cells. The classification is sticky: once directions begin, the rest of the record is directions.

10. Worked example — "avocado dip" (single page)

Raw bytes at offset 0x8000 of the root RECIPE.TXT (. = control byte):

0x8000: 01 0B "avocado dip"
        02 02 "04"
        03 09 "appetizer"
        07 01 "2"   08 05 "large"  09 04 "ripe"   0A 07 "avocado"
        0B 01 "3"   0C 03 "tbl"                    0E 0B "lemon juice"
        0F 01 "1"   10 05 "small"  11 03 "can"     12 0C "chili pepper"
        13 03 "1/2" 14 03 "tsp"                    16 04 "salt"
        17 01 "1"   18 03 "tbl"    19 06 "minced"  1A 05 "onion"
                    1C 04 "dash"                   1E 0D "tabasco sauce"
        FF FF                         <- end ingredients, begin directions
        03 20 " Mash peeled avocados well or use"
        04 23 "blender.  Add remaining ingredients"
        05 0F "and blend well."
        07 22 "Refrigerate 1 hour before serving."
        FF                            <- record ends; next byte 01 = next title

Decoded:

Title:          avocado dip
Servings:       04
Classification: appetizer
Ingredients (qty / measure / prep / name):
  2    large        ripe    avocado
  3    tbl                  lemon juice
  1    small  can           chili pepper
  1/2  tsp                  salt
  1    tbl          minced  onion
       dash                 tabasco sauce
Directions:
  Mash peeled avocados well or use blender. Add remaining ingredients
  and blend well. Refrigerate 1 hour before serving.

(Here the ingredient grid columns map via base=0x07; e.g. 0x11="can"(0x11-7) mod 4 = 2 = preparation, (0x11-7) div 4 = 1 = row 1.)

11. Multi-page example — "Peanut Butter Balls" (3 pages)

PAGE 0 (base 0x07): title, classification "Candy", source "From: Carol A.
        Williams", ingredient rows 0–5 (Butter … Chocolate Chips).
0xFF
PAGE 1 (base 0x01): one more ingredient row -> row_offset(6) + 0 = ingredient 7
        (1/4 Cake Parafin Wax).
0xFF
PAGE 2 (directions): "Blend butter, peanut butter & powdered sugar together.
        Mix in rice Krispy's …" (verbatim across several line-fields).

This record is the validation oracle: the program's own PRINTREC.TXT shows exactly these 7 ingredients, in this order, with these columns, and this directions text.

12. Reference reconstruction algorithm (language-agnostic)

function extract(bytes):
    starts = []
    for i in 0 .. len(bytes)-4:
        if is_title_signature(bytes, i):      # §8.1
            starts.append(i); skip i past the title value
    records = []
    for k in 0 .. len(starts)-1:
        end = starts[k+1] if k+1 < len(starts) else len(bytes)
        rec = parse_record(bytes, starts[k], end)
        if rec: records.append(rec)
    return records

function parse_record(bytes, start, end):
    pages = [[]]; pos = start
    while pos < end:
        b = bytes[pos]
        if b == 0xFF: pages.append([]); pos += 1; continue   # page break
        if b == 0x00: break                                  # slack
        L = bytes[pos+1]
        if pos+2+L > end: break                              # overrun -> slack
        v = bytes[pos+2 .. pos+2+L]
        if v has any non-printable byte: break               # slack
        pages[-1].append({tag:b, val:v}); pos += 2+L
    if pages[0] empty or pages[0][0].tag != 0x01: return null

    title = pages[0][0].val
    grid = []; row_offset = 0; in_dir = false

    # page 0: header metadata + ingredients (base 0x07)
    pmax = -1
    for f in pages[0][1:]:
        if 0x02 <= f.tag <= 0x06:
            classify as servings / source(From:) / classification
        else if f.tag >= 0x07:
            if is_date(f.val): record.date = f.val; continue
            col = (f.tag-7) mod 4; lr = (f.tag-7) div 4
            grid.append((lr, col, f.val)); pmax = max(pmax, lr)
    if pmax >= 0: row_offset = pmax + 1

    # later pages: ingredients (base 0x01) or directions
    for p in pages[1:]:
        if page_is_dir(p): in_dir = true                     # §9
        if in_dir:
            append every f.val to directions
        else:
            lmax = -1
            for f in p:
                if is_date(f.val): continue
                col = (f.tag-1) mod 4; lr = (f.tag-1) div 4
                grid.append((row_offset+lr, col, f.val)); lmax = max(lmax, lr)
            if lmax >= 0: row_offset += lmax + 1

    rows = assemble grid cells into [qty, measure, prep, name] by (row, col)
    drop empty rows
    return {title, servings, classes, source, date, ingredients: rows, directions}

is_date(v) = v matches ^\d{1,2}-\d{1,2}-\d{2,4}$ page_is_dir(p) = any f in p with len(f.val) > 30 OR f.val matches [a-z]{2}[.;]\s

13. Companion files

PROFILE.TXT (decoded — plain text, CRLF)

Device/path configuration in a simple keyword grammar:

define_device,disk
 path,C:\COOKBOOK
 path,C:\COOKBOOK\RECIPE1
 ...
 path,A:\
end_define

Lists the database directories the program searches. Useful for understanding how the five RECIPE*/ databases were registered.

PRINTREC.TXT (decoded — plain text)

A formatted recipe print-out. Its column layout reveals the program's canonical presentation and confirms field semantics:

                               PEANUT BUTTER BALLS
  Serving Size : 
  Keywords     :                             Candy
                 From: Carol A. Williams

  Qty       Measurement      Preparation           Ingredient
  ---       -----------      -----------           ----------
  1/2       Cup                                    Butter
  ...

→ Columns are Qty / Measurement / Preparation / Ingredient; header carries Serving Size, Keywords, and From: — exactly the tag 0x010x06 fields. This is the layout to mirror when recreating the program's output.

SCREEN.TXT / CONTROL.TXT (not decoded)

SCREEN.TXT (24 KB per DB) contains the program's UI text (menus such as "MICRO COOKBOOK", "RECIPE INDEX", "INGREDIENT INDEX", "CLASSIFICATION INDEX", function-key help) and, (inferred), the data-entry form definitions that map field tags to on-screen positions. CONTROL.TXT (24 KB) is engine state. Neither is required to read recipes.

14. Open problems (for full DOS-functionality recreation)

  1. The 32 KB header index region. Internal structure undecoded. To recreate the program's browse by recipe / ingredient / classification features you must decode (or simply rebuild) these indexes. Rebuilding from the parsed records is the pragmatic route; matching the original byte layout is the open reverse-engineering task. The region opens with what looks like an 8-byte signature/hash followed by packed structures.
  2. SCREEN.TXT form layout. Decoding it would let you map field tags to labelled screen positions authoritatively (rather than via the §5/§6 rules) and reproduce the data-entry UI.
  3. Per-database variation. Field-tag meanings are stable, but exact screen layouts differ slightly between databases (each RECIPE*/ has its own SCREEN.TXT). The §5/§6 rules held across all five sampled databases.

15. Quick reference card

RECORD AREA = stream of TLV fields after a ~32 KB index region (offset varies).
FIELD       = tag(1) len(1) value(len ASCII).
0xFF        = page break; tags reset (ingredient base 0x07 -> 0x01).
0x00        = padding/slack.
TAGS p0     : 0x01 title | 0x02 servings | 0x03-06 classes/From: | 0x07+ grid | date
GRID        : col=(tag-base)%4 [qty,measure,prep,name]; row=(tag-base)/4; rows
              accumulate globally across pages (row_offset).
DIRECTIONS  : prose TLV fields after the grid; tags not meaningful; concat in order.
RECORDS     : not delimited -> find by title signature; stop on first slack byte.
DEDUP/SLACK : files are never compacted -> expect duplicates & truncated remnants.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment