Skip to content

Instantly share code, notes, and snippets.

@coreyward
Created March 28, 2026 22:39
Show Gist options
  • Select an option

  • Save coreyward/c5eebefc7e090c090a1fefd9df387eab to your computer and use it in GitHub Desktop.

Select an option

Save coreyward/c5eebefc7e090c090a1fefd9df387eab to your computer and use it in GitHub Desktop.
bz_done file format reference

Backup Action Logs (bz_done_YYYYMMDD_0.dat)

Location: bzdata/bzbackup/bzdatacenter/

Purpose: Complete history of every backup action — file uploads, deletions, expirations, and large-file reassembly. This is the primary source for reconstructing churn and backup activity over time.

Scale: ~570 files spanning March 2020 to March 2026, totaling 15 GB and 45.8 million lines.

Encoding: UTF-8 text, tab-delimited, one record per line. No header row.

File naming: bz_done_YYYYMMDD_0.dat where the date indicates when the batch was created. Files are created every 2-4 days during active backup periods.

Record Format

Each line has 14 tab-separated fields:

5	+	m--	20260101003840	4_h{HOST_GUID}_f{FILE_ID}_d20260101_m003840_c000_v0001408_t0038	u06	{FILE_ID}	k0_n00000	{SHA1_HASH}	{TIMESTAMP_A}	{TIMESTAMP_B}	-	1677	/Users/{username}/src/.../output-lib
# Field Description
1 Version Format version, always 5 in observed data
2 Action The backup action performed (see Action Types below)
3 Flags Three-character flag field (see Flags below)
4 Timestamp When this action occurred: YYYYMMDDHHMMSS (GMT)
5 File descriptor Structured ID (see File Descriptor below)
6 Upload thread uNN where NN is the upload thread number (hex), or u-- for non-upload actions
7 File ID Hex file ID, matches the _f component of field 5. Zero-padded to 16 hex chars
8 Chunk info k{type}_n{chunk_number} (see Chunking below)
9 SHA-1 hash SHA-1 of the file content. 40-char hex, or all dashes ---...--- for deletions/expirations
10 Timestamp A Hex milliseconds since epoch — appears to be the file's creation time (or first-seen time)
11 Timestamp B Hex milliseconds since epoch — appears to be the file's last-modified time
12 Chunk file ref - for single-part files, or cfXXXXXXXXXXXXXXXXX referencing a chunk's file ID
13 Size File size in bytes (decimal). For chunked uploads, this is the chunk size (e.g., 10485760 = 10 MB)
14 Path Absolute file path

Action Types (Field 2)

Action Meaning Description
+ Backed up File was uploaded (new file or updated content). Most common action.
= Re-verified File content confirmed unchanged, backup record refreshed. Appears during periodic re-verification sweeps.
- Deleted File no longer exists on disk. Backblaze records the deletion. File descriptor uses placeholder pattern r_h..._f----------------_d--------_m------_c000_v-------_t----. Size is 0.
x Expired File removed from backup (may have been deleted earlier, now purged from retention). Similar placeholder descriptor to - but retains the original file descriptor. SHA-1 and timestamps A use dashes.
! Reassembly Large file chunk reassembly record. Appears after all chunks of a large file have been uploaded. Uses placeholder descriptor. Size reflects the total reassembled file size.

Flags (Field 3)

Three-character field observed with these values:

Flags Frequency Likely meaning
m-- Most common Modified file (content changed)
d-- Common De-duplicated or data-only (content matches existing backup)
--- Common No special flags (used for deletions, expirations, chunks)
dp- Rare De-duplicated with some property flag
b-- Rare Possibly a "big file" flag

File Descriptor (Field 5)

Structured string encoding the backup operation context:

4_h{HOST_GUID}_f{FILE_ID}_d20260101_m003840_c000_v0001408_t0038
Component Meaning
4 Descriptor format version
h{HOST_GUID} Host GUID (identifies this computer)
f{FILE_ID} File ID (hex, unique per file path)
d20260101 Date of action (YYYYMMDD)
m003840 Time of action (HHMMSS)
c000 Counter/sequence (usually 000)
v0001408 Volume/version identifier
t0038 Thread or transaction ID

For deletions (- action), the descriptor uses a placeholder pattern:

r_h{HOST_GUID}_f----------------_d--------_m------_c000_v-------_t----

Chunking (Field 8)

Files are either single-part or chunked:

Pattern Meaning
k0_n00000 Single-part file (not chunked). The overwhelming majority of entries.
k5_nXXXXX Chunk number XXXXX (hex) of a large file split into 10 MB parts
k1_nXXXXX Appears in ! (reassembly) records. n is the total chunk count.
k-_n----- Placeholder for deletions/expirations

When a large file is chunked (k5):

  • Field 12 contains cfXXXXXXXXXXXXXXXXX — a chunk-specific file ID
  • Field 13 is the chunk size (typically 10485760 = 10 MB, except the final chunk)
  • Multiple consecutive lines share the same path but have different chunk numbers

Upload Thread (Field 6)

Pattern Meaning
uNN Upload thread number (two hex digits, e.g., u06, u10, u16)
u-- No upload thread (used for deletions, expirations, reassembly records)

The backup client runs multiple upload threads in parallel (observed up to 8+ concurrent threads based on config num_backup_threads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment