Skip to content

Instantly share code, notes, and snippets.

@gcr
Last active September 16, 2023 17:20
Show Gist options
  • Save gcr/5e0a8d4f0b638df530445125c1ff9bd0 to your computer and use it in GitHub Desktop.
Save gcr/5e0a8d4f0b638df530445125c1ff9bd0 to your computer and use it in GitHub Desktop.

Recovering corrupt files from a failing Raspberry Pi SD card

Yesterday night, my RasPi wouldn’t boot (black screen). It’s an SD card issue: macOS reports I/O timeouts, so something’s clearly going wrong.

Rescue what parts of the filesystem we can

First, we can use ddrescue to gather all of the blocks:

sudo ddrescue -vv -n /dev/rdisk4 sd-card.img sd-card.img.ddrescue-map-2

The default configuration is to read 128 sectors of 512 bytes each per read() call, ad if any fail, to skip 128 sectors (64kb) ahead. This takes all night because each time reads something from a failing part of the card, MacOS takes 30 sec to I/O timeout, and I think this can’t be configured.

What blocks are bad?

ddrescue writes to a log file, which lets us see which parts of the drive are marked as “bad” and which ddrescue was able to recover. You can just read this file, but ddrescuelog also has an annotate flag to turn the numbers into something human-readable:

ddrescuelog -A sd-card.img.ddrescue-map-2
...

0x53F38000  0x00000200  -	#     1408 MB       512
0x53F38200  0x00003C00  /	#     1408 MB     15360
0x53F3BE00  0x00000200  -	#     1408 MB       512
0x53F3C000  0x03C3D000  +	#     1408 MB    63164 kB
...

In this example, 512 bytes about 1408mb are bad, the next 15kb are “scrapable”, the next 512 are bad, and the next 63MB were able to be rescued.

Which blocks do the bad bytes correspond to?

First, calculate the start of the partition:

$ fdisk sd-card.img
Disk: sd-card.img	geometry: 966/255/63 [15523840 sectors]
Signature: 0xAA55
         Starting       Ending
 #: id  cyl  hd sec -  cyl  hd sec [     start -       size]
------------------------------------------------------------------------
 1: 0C   64   0   1 - 1023   3  32 [      8192 -     524288] Win95 FAT32L
 2: 83 1023   3  32 - 1023  63  32 [    532480 -   14991360] Linux files*
 3: 00    0   0   0 -    0   0   0 [         0 -          0] unused
 4: 00    0   0   0 -    0   0   0 [         0 -          0] unused

In this example, the second partition starts at sector 532480. Sectors are 512b each, so that’s byte 272629760.

Copy just that partition to another file:

$ dd if=sd-card.img bs=512 skip=532480 > sd-card.img.part2
14991360+0 records in
14991360+0 records out
7675576320 bytes transferred in 41.005132 secs (187185748 bytes/sec)

From here on out, we need to work with filesystem blocks, which on ext4 default to 4096b. The formula is:

filesystem_block = (starting_byte - (filesystem_starting_sector * 512)) / 4096

Example:

starting_byte = 0x53F38000
filesystem_starting_sector = 532480
-> filesystem_block = 277304.0

Note that since sectors are smaller than blocks, this value may be fractional.

Which files contain these blocks?

On ext4, each block is either unused, part of the metadata (directory structure or bookkeeping), or part of a file’s content. We can use debugfs to inspect these:

# debugfs sd-card.img.part2
debugfs 1.46.5 (30-Dec-2021)
debugfs:  testb 277304
Block 277304 marked in use

Uh oh, we’re unlucky, and this block is important. In many cases, we can at least map this block to some inode number or path to see which file is affected:

debugfs:  icheck 277304
Block	Inode number
277304	88263

debugfs:  ncheck 88263
Inode	Pathname
88263	/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Metadata/TV Shows/...

Looks like this is just some metadata file!

Automating the process

Here’s a Python script that parses a ddrescue log and outputs a list of bad blocks that you can use in your debugfs adventures:

#!/usr/bin/env pyhon

"""
Given a ddrescue map, calculate which
ext4 blocks may be affected by
bad sectors on the drive.

Results are output to `debugfs-script.txt`

Usage:
    $ python this_script ddrescue_map partition_start_sector

Example:
    $ python ddrescue-to-debugfs.py sd-card.img.ddrescue-map-2 532480
"""

import sys
import os
import numpy as np
import re
import math

sector_size = 512

filename = sys.argv[1]
partition_start_sectors = int(sys.argv[2])
partition_start_bytes = partition_start_sectors*sector_size
ext4_block_size = 4096

def parse(st):
    if st.lower().startswith('0x'):
        return int(st[2:], base=16)
    else:
        return int(st)

badblocks = []
for line in open(filename):
    # comments
    line = re.sub("#.*", '', line).strip()
    # split on successive whitespace
    line = re.split("\s+", line)
    match line:
        case (start, sz, '/' | '*' | '-' as bad):
            start, sz = parse(start), parse(sz)

            first_bad_byte = (start - partition_start_bytes)
            last_bad_byte = first_bad_byte + sz

            if first_bad_byte < partition_start_bytes:
                print(f"Warning: {first_bad_byte} is before this partition. Skipping.")
                continue

            first_block = math.floor(first_bad_byte / ext4_block_size)
            last_block = math.ceil(last_bad_byte / ext4_block_size)

            for block_idx in range(first_block, last_block + 1):
                badblocks.append(block_idx)

print(f"{len(badblocks)} bad blocks = {len(badblocks)*ext4_block_size/1024} kB")

with open("debugfs-script.txt", "w") as f:
    for block in badblocks:
        f.write(f"testb {block}\n")
        f.write(f"icheck {block}\n")

Conclusion

Ultimately, even though I recovered everything except 600kb of my 8GB SD card, several key files and directories were corrupted, so I didn’t feel comfortable just copying the system.

Oh well, who needs /usr/sbin/flashrom anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment