Skip to content

Instantly share code, notes, and snippets.

@chapmanjacobd
Last active November 26, 2023 18:26
Show Gist options
  • Save chapmanjacobd/bc6e31c8bc3647e0bcb0c43bc0464a9c to your computer and use it in GitHub Desktop.
Save chapmanjacobd/bc6e31c8bc3647e0bcb0c43bc0464a9c to your computer and use it in GitHub Desktop.
BTRFS single mode evaluation

The experiment

Preparation

truncate -s20G d1.img
truncate -s20G d2.img
truncate -s20G d3.img
truncate -s20G d4.img
set ld1 (sudo losetup --show --find d1.img)
set ld2 (sudo losetup --show --find d2.img)
set ld3 (sudo losetup --show --find d3.img)
set ld4 (sudo losetup --show --find d4.img)

sudo mkfs.btrfs -d single -m raid1c3 "$ld1" "$ld2" "$ld3" "$ld4" 

sudo mkdir -p /mnt/loop
sudo mount "$ld1" /mnt/loop

sudo dd if=/dev/zero of=/mnt/loop/file bs=1M count=500

I also copied a bunch of videos

sudo cp -r ~/d/70_Now_Watching/ /mnt/loop/

First, I check the distribution of data

sudo btrfs device usage /mnt/loop
/dev/loop0, ID: 1
   Device size:            20.00GiB
   Device slack:              0.00B
   Data,single:            19.00GiB
   Unallocated:             1.00GiB

/dev/loop1, ID: 2
   Device size:            20.00GiB
   Device slack:              0.00B
   Data,single:            18.00GiB
   Metadata,RAID1C3:        1.00GiB
   System,RAID1C3:          8.00MiB
   Unallocated:          1016.00MiB

/dev/loop2, ID: 3
   Device size:            20.00GiB
   Device slack:              0.00B
   Data,single:            18.00GiB
   Metadata,RAID1C3:        1.00GiB
   System,RAID1C3:          8.00MiB
   Unallocated:          1016.00MiB

/dev/loop3, ID: 4
   Device size:            20.00GiB
   Device slack:              0.00B
   Data,single:            18.00GiB
   Metadata,RAID1C3:        1.00GiB
   System,RAID1C3:          8.00MiB
   Unallocated:          1016.00MiB

Seems to be distributed pretty evenly. Now let's fuck shit up !!

sudo dd if=/dev/random of="$ld3"
dd: writing to '/dev/loop2': No space left on device
41943041+0 records in
41943040+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 101.421 s, 212 MB/s
sudo btrfs scrub start /mnt/loop/

ERROR: there are uncorrectable errors

UUID:             b4ade67a-8c7b-45c3-b747-8280d9504714
Scrub started:    Sat Jan 21 23:16:09 2023
Status:           finished
Duration:         0:00:25
Total to scrub:   72.80GiB
Rate:             2.91GiB/s
Error summary:    super=2 csum=4698471
Corrected:      5662
Uncorrectable:  4692809
Unverified:     0

Now we have a little script to check our files:

from pathlib import Path

error_count = 0
success_count = 0

for file_path in Path('/mnt/loop/').rglob('*'):
    try:
        if file_path.is_file():
            with open(file_path, 'rb') as f:
                # read the entire contents of the file
                file_contents = f.read()
                success_count += 1
    except IOError:
        error_count += 1

print(f'Number of successful reads: {success_count}')
print(f'Number of IO errors: {error_count}')

Results

And the results are... drumroll please......... no? ok fine.

Number of successful reads: 119
Number of IO errors: 66

Interesting, but I wonder if there is any variation in size of those files or if we could simulate heavy file extent fragmentation.

Successful read files size: min 0       average 241136404 max 2397645276 sum 28695232117
IO error files size:        min 1888612 average 745390515 max 4884066696 sum 49195774012

Interesting... maybe I need to run a bigger test but the result of being able to read a 2.4GB file is probably somewhat surprising to the person who said only kbs would be accessible. But it is true that only 37% of data was still accessible, and this is a very small contrived simulation with only one process writing data.

It does seem like smaller files are going to be more likely to survive, but that can be expected. The gods of bits only make guillotines so big.

This result does make me feel a little bit better--though I will still investigate MergerFS a little bit more. I really like btrfs and switching to MergerFS seems like a lot of work...

A script to simulate this test using the output of btrfs inspect-internal dump-tree --extents might be interesting. I wonder how much file extent fragmentation my drives actually have.

Cleanup

sudo umount /mnt/loop
sudo losetup -d "$ld1" "$ld2" "$ld3" "$ld4"
rm d1.img d2.img d3.img d4.img

uname -a
Linux pakon 6.1.6-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 14 16:55:06 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
@chapmanjacobd
Copy link
Author

chapmanjacobd commented Jan 22, 2023

With raid0 it is a lot less... only 1MB is readable. those files are probably all inlined extents

sudo mkfs.btrfs -d raid0 -m raid1 "$ld1" "$ld2" "$ld3" "$ld4" 

Before removing device 2

Number of successful reads: 280
Number of IO errors: 0
Successful read files size: sum 83730370013 max 4884066696 average 299037035

After removing device 2

Number of successful reads: 109
Number of IO errors: 171
Successful read files size: sum 694079      max 57344      average 6367
IO error files size:        sum 83729675934 max 4884066696 average 489647227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment