Yesterday night, my RasPi wouldn’t boot (black screen). It’s an SD card issue: macOS reports I/O timeouts, so something’s clearly going wrong.
First, we can use ddrescue
to gather all of the blocks:
sudo ddrescue -vv -n /dev/rdisk4 sd-card.img sd-card.img.ddrescue-map-2
The default configuration is to read 128 sectors of 512 bytes each per read() call, ad if any fail, to skip 128 sectors (64kb) ahead. This takes all night because each time reads something from a failing part of the card, MacOS takes 30 sec to I/O timeout, and I think this can’t be configured.
ddrescue
writes to a log file, which lets us see which parts of the drive are marked as “bad” and which ddrescue was able to recover. You can just read this file, but ddrescuelog
also has an annotate flag to turn the numbers into something human-readable:
ddrescuelog -A sd-card.img.ddrescue-map-2
... 0x53F38000 0x00000200 - # 1408 MB 512 0x53F38200 0x00003C00 / # 1408 MB 15360 0x53F3BE00 0x00000200 - # 1408 MB 512 0x53F3C000 0x03C3D000 + # 1408 MB 63164 kB ...
In this example, 512 bytes about 1408mb are bad, the next 15kb are “scrapable”, the next 512 are bad, and the next 63MB were able to be rescued.
First, calculate the start of the partition:
$ fdisk sd-card.img Disk: sd-card.img geometry: 966/255/63 [15523840 sectors] Signature: 0xAA55 Starting Ending #: id cyl hd sec - cyl hd sec [ start - size] ------------------------------------------------------------------------ 1: 0C 64 0 1 - 1023 3 32 [ 8192 - 524288] Win95 FAT32L 2: 83 1023 3 32 - 1023 63 32 [ 532480 - 14991360] Linux files* 3: 00 0 0 0 - 0 0 0 [ 0 - 0] unused 4: 00 0 0 0 - 0 0 0 [ 0 - 0] unused
In this example, the second partition starts at sector 532480. Sectors are 512b each, so that’s byte 272629760.
Copy just that partition to another file:
$ dd if=sd-card.img bs=512 skip=532480 > sd-card.img.part2 14991360+0 records in 14991360+0 records out 7675576320 bytes transferred in 41.005132 secs (187185748 bytes/sec)
From here on out, we need to work with filesystem blocks, which on ext4 default to 4096b. The formula is:
filesystem_block = (starting_byte - (filesystem_starting_sector * 512)) / 4096
Example:
starting_byte = 0x53F38000 filesystem_starting_sector = 532480 -> filesystem_block = 277304.0
Note that since sectors are smaller than blocks, this value may be fractional.
On ext4, each block is either unused, part of the metadata (directory structure or bookkeeping), or part of a file’s content. We can use debugfs
to inspect these:
# debugfs sd-card.img.part2 debugfs 1.46.5 (30-Dec-2021) debugfs: testb 277304 Block 277304 marked in use
Uh oh, we’re unlucky, and this block is important. In many cases, we can at least map this block to some inode number or path to see which file is affected:
debugfs: icheck 277304 Block Inode number 277304 88263 debugfs: ncheck 88263 Inode Pathname 88263 /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Metadata/TV Shows/...
Looks like this is just some metadata file!
Here’s a Python script that parses a ddrescue
log and outputs a list of bad blocks that you can use in your debugfs
adventures:
#!/usr/bin/env pyhon
"""
Given a ddrescue map, calculate which
ext4 blocks may be affected by
bad sectors on the drive.
Results are output to `debugfs-script.txt`
Usage:
$ python this_script ddrescue_map partition_start_sector
Example:
$ python ddrescue-to-debugfs.py sd-card.img.ddrescue-map-2 532480
"""
import sys
import os
import numpy as np
import re
import math
sector_size = 512
filename = sys.argv[1]
partition_start_sectors = int(sys.argv[2])
partition_start_bytes = partition_start_sectors*sector_size
ext4_block_size = 4096
def parse(st):
if st.lower().startswith('0x'):
return int(st[2:], base=16)
else:
return int(st)
badblocks = []
for line in open(filename):
# comments
line = re.sub("#.*", '', line).strip()
# split on successive whitespace
line = re.split("\s+", line)
match line:
case (start, sz, '/' | '*' | '-' as bad):
start, sz = parse(start), parse(sz)
first_bad_byte = (start - partition_start_bytes)
last_bad_byte = first_bad_byte + sz
if first_bad_byte < partition_start_bytes:
print(f"Warning: {first_bad_byte} is before this partition. Skipping.")
continue
first_block = math.floor(first_bad_byte / ext4_block_size)
last_block = math.ceil(last_bad_byte / ext4_block_size)
for block_idx in range(first_block, last_block + 1):
badblocks.append(block_idx)
print(f"{len(badblocks)} bad blocks = {len(badblocks)*ext4_block_size/1024} kB")
with open("debugfs-script.txt", "w") as f:
for block in badblocks:
f.write(f"testb {block}\n")
f.write(f"icheck {block}\n")
Ultimately, even though I recovered everything except 600kb of my 8GB SD card, several key files and directories were corrupted, so I didn’t feel comfortable just copying the system.
Oh well, who needs /usr/sbin/flashrom
anyway?