After reading this Ars Technica article, I decided that I wanted to start using btrfs on my home fileserver. It had been running for a few years with an mdadm raid-10 array, formatted with ext4, holding about 3.4 TB of data. I figured I would take advantage of some of the special capabilities of btrfs to perform the conversion in place. After some research, I formed my basic plan.
- backup data to external drives
- remove two of the drives from the mdadm raid-10
- configure those two drive with a btrfs raid-0 filesystem
- copy the data from the degraded mdadm raid-10 to the new btrfs raid-0
- completely deactivate the mdadm raid-10
- add the last two drives to the btrfs raid-0 filesystem
- convert the btrfs filesystem to a raid-10
I knew that if I removed the wrong two drives from the raid-10, I would lose everything, so I took the time to perform a full backup. I plugged some old 2TB hard drives in to an external dock and backed up all of my data.
Just to be clear, when I say remove from an array, I'm talking about using the --fail
/ --remove
flags of the mdadm command, not physically removing the drives from the server. After some research, I found this section of the mdadm man page.
For RAID10 arrays where the number of copies evenly divides the number of devices, the
devices can be conceptually divided into sets where each set contains a single complete
copy of the data on the array. Sometimes a RAID10 array will be configured so that
these sets are on separate controllers. In this case all the devices in one set can be
failed by giving a name like set-A or set-B to --fail. The appropriate set names are
reported by --detail.
OK, that seems simple enough.
[root@z ~]# mdadm --detail /dev/md0 | egrep 'Level|State|Devices|active|removed'
Raid Level : raid10
Raid Devices : 4
Total Devices : 4
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 8 1 0 active sync set-A /dev/sda1
1 8 17 1 active sync set-B /dev/sdb1
2 8 33 2 active sync set-A /dev/sdc1
3 8 49 3 active sync set-B /dev/sdd1
I saw that set-B consisted of sdb1 and sdd1. Target acquired.
[root@z ~]# mdadm /dev/md0 --fail set-B
mdadm: set 8:17 faulty in /dev/md0
mdadm: set 8:49 faulty in /dev/md0
[root@z ~]# mdadm /dev/md0 --remove failed
mdadm: hot removed 8:17 from /dev/md0
mdadm: hot removed 8:49 from /dev/md0
At this point, the state was clean (and degraded), but not failed.
[root@z ~]# mdadm --detail /dev/md0 | egrep 'Level|State|Devices|active|removed'
Raid Level : raid10
Raid Devices : 4
Total Devices : 2
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 8 1 0 active sync set-A /dev/sda1
2 0 0 2 removed
2 8 33 2 active sync set-A /dev/sdc1
6 0 0 6 removed
I ensured that I could still read and write from the array. So far so good.
For the new layout, I decided to get rid of partitions altogether and just access the raw devices. I wiped out the existing partition tables using dd
.
[root@z ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=50
[root@z ~]# dd if=/dev/zero of=/dev/sdd bs=1M count=50
Then I just pointed mkfs.btrfs
to the drives with the necessary flags.
[root@z ~]# mkfs.btrfs --data raid0 --metadata raid0 --force --label DATA /dev/sd[bd]
WARNING! - Btrfs v3.12 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
adding device /dev/sdd id 2
fs created label DATA on /dev/sdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 5.46TiB
Btrfs v3.12
A multi-device btrfs filesystem doesn't create a single block device like mdadm does. Instead, you can mount any of the member devices, and the filesystem is smart enough to access all of the devices properly. I mounted it and started copying over my data.
[root@z ~]# mount /dev/sdb /mnt
[root@z ~]# rsync -a --stats --info=progress2 /srv/ /mnt/
After the data transfer completed, I unmounted the old array, and remounted the btrfs filesystem at the original mount point.
[root@z ~]# umount /srv
[root@z ~]# umount /mnt
[root@z ~]# mount /dev/sdb /srv
At this point I updated /etc/fstab
to remove the old array and add the new filesystem.
[root@z ~]# grep srv /etc/fstab
#UUID=3720ec23-3309-446e-a80b-c2914c96993d /srv ext4 defaults,noatime 0 2
LABEL=DATA /srv btrfs defaults,noatime 0 0
Next I deactivated the mdadm array and wiped the partition tables of the last two disks.
[root@z ~]# mdadm --stop /dev/md0
[root@z ~]# sed -i '/^ARRAY \/dev\/md0/ s_^_#_' /etc/mdadm.conf
[root@z ~]# dd if=/dev/zero of=/dev/sda bs=1M count=50
[root@z ~]# dd if=/dev/zero of=/dev/sdc bs=1M count=50
Here is what the btrfs filesystem looked like with two disks.
[root@z ~]# btrfs fi show /dev/sdb
Label: DATA uuid: cc2893b9-4494-4e95-9394-9ec27158e596
Total devices 2 FS bytes used 3.38TiB
devid 1 size 2.73TiB used 1.69TiB path /dev/sdb
devid 2 size 2.73TiB used 1.69TiB path /dev/sdd
Btrfs v3.12
Adding the remaining drives was very simple.
[root@z ~]# btrfs device add /dev/sda /srv
[root@z ~]# btrfs device add /dev/sdc /srv
The new new devices were added immediately, but contain zero data (so far).
[root@z ~]# btrfs fi show /dev/sdb
Label: DATA uuid: cc2893b9-4494-4e95-9394-9ec27158e596
Total devices 4 FS bytes used 3.38TiB
devid 1 size 2.73TiB used 1.69TiB path /dev/sdb
devid 2 size 2.73TiB used 1.69TiB path /dev/sdd
devid 3 size 2.73TiB used 0.00 path /dev/sda
devid 4 size 2.73TiB used 0.00 path /dev/sdc
Btrfs v3.12
To finalize this migration, I needed to balance things out to a raid-10 layout. I knew that this would take a while, so I started the process in a screen session.
[root@z ~]# btrfs balance start -dconvert=raid10 -mconvert=raid10 /srv
This command blocks the prompt until completion, so I detached from the screen session in order to check the status.
[root@z ~]# btrfs fi show /dev/sda
Label: DATA uuid: cc2893b9-4494-4e95-9394-9ec27158e596
Total devices 4 FS bytes used 3.38TiB
devid 1 size 2.73TiB used 1.70TiB path /dev/sdb
devid 2 size 2.73TiB used 1.70TiB path /dev/sdd
devid 3 size 2.73TiB used 17.00GiB path /dev/sda
devid 4 size 2.73TiB used 17.00GiB path /dev/sdc
Btrfs v3.12
[root@z ~]# btrfs fi df /srv
Data, RAID10: total=34.00GiB, used=29.94GiB
Data, RAID0: total=3.35TiB, used=3.35TiB
Data, single: total=8.00MiB, used=7.00MiB
System, RAID0: total=16.00MiB, used=272.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID0: total=5.00GiB, used=3.89GiB
Metadata, single: total=8.00MiB, used=16.00KiB
Fast forward in time, and the balance appeared complete.
[root@z ~]# btrfs fi show /dev/sda
Label: DATA uuid: cc2893b9-4494-4e95-9394-9ec27158e596
Total devices 4 FS bytes used 3.38TiB
devid 1 size 2.73TiB used 1.70TiB path /dev/sdb
devid 2 size 2.73TiB used 1.70TiB path /dev/sdd
devid 3 size 2.73TiB used 1.70TiB path /dev/sda
devid 4 size 2.73TiB used 1.70TiB path /dev/sdc
Btrfs v3.12
[root@z ~]# btrfs fi df /srv
Data, RAID10: total=3.39TiB, used=3.38TiB
System, RAID10: total=128.00MiB, used=384.00KiB
Metadata, RAID10: total=6.00GiB, used=3.66GiB
I connected back to my screen session to see the final output.
[root@z ~]# btrfs balance start -dconvert=raid10 -mconvert=raid10 /srv
Done, had to relocate 1741 out of 1741 chunks
And there you have it, a full transformation from mdadm to btrfs, without having to remove a single physical drive. The btrfs wiki and the Arch wiki were critical to my research of this project, so I must give them credit.