How I switched from mdadm to btrfs

After reading this Ars Technica article, I decided that I wanted to start using btrfs on my home fileserver. It had been running for a few years with an mdadm raid-10 array, formatted with ext4, holding about 3.4 TB of data. I figured I would take advantage of some of the special capabilities of btrfs to perform the conversion in place. After some research, I formed my basic plan.

backup data to external drives
remove two of the drives from the mdadm raid-10
configure those two drive with a btrfs raid-0 filesystem
copy the data from the degraded mdadm raid-10 to the new btrfs raid-0
completely deactivate the mdadm raid-10
add the last two drives to the btrfs raid-0 filesystem
convert the btrfs filesystem to a raid-10

Always back up first

I knew that if I removed the wrong two drives from the raid-10, I would lose everything, so I took the time to perform a full backup. I plugged some old 2TB hard drives in to an external dock and backed up all of my data.

Degrade the old array

Just to be clear, when I say remove from an array, I'm talking about using the --fail / --remove flags of the mdadm command, not physically removing the drives from the server. After some research, I found this section of the mdadm man page.

For RAID10 arrays where the number of copies evenly divides the number of devices, the 
devices can be conceptually divided into sets where each set contains a single complete 
copy of the data on the array.  Sometimes a RAID10 array will be configured so that 
these sets are on separate controllers.  In this case all the devices in one set can be 
failed by giving a name like set-A or set-B to --fail.  The appropriate set names are 
reported by --detail.

OK, that seems simple enough.

[root@z ~]# mdadm --detail /dev/md0 | egrep 'Level|State|Devices|active|removed'
     Raid Level : raid10
   Raid Devices : 4
  Total Devices : 4
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync set-A   /dev/sda1
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1

I saw that set-B consisted of sdb1 and sdd1. Target acquired.

[root@z ~]# mdadm /dev/md0 --fail set-B
mdadm: set 8:17 faulty in /dev/md0
mdadm: set 8:49 faulty in /dev/md0
[root@z ~]# mdadm /dev/md0 --remove failed
mdadm: hot removed 8:17 from /dev/md0
mdadm: hot removed 8:49 from /dev/md0

At this point, the state was clean (and degraded), but not failed.

[root@z ~]# mdadm --detail /dev/md0 | egrep 'Level|State|Devices|active|removed'
     Raid Level : raid10
   Raid Devices : 4
  Total Devices : 2
          State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync set-A   /dev/sda1
       2       0        0        2      removed
       2       8       33        2      active sync set-A   /dev/sdc1
       6       0        0        6      removed

I ensured that I could still read and write from the array. So far so good.

Creating the new filesystem

For the new layout, I decided to get rid of partitions altogether and just access the raw devices. I wiped out the existing partition tables using dd.

[root@z ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=50
[root@z ~]# dd if=/dev/zero of=/dev/sdd bs=1M count=50

Then I just pointed mkfs.btrfs to the drives with the necessary flags.

[root@z ~]# mkfs.btrfs --data raid0 --metadata raid0 --force --label DATA /dev/sd[bd]

WARNING! - Btrfs v3.12 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
adding device /dev/sdd id 2
fs created label DATA on /dev/sdb
	nodesize 16384 leafsize 16384 sectorsize 4096 size 5.46TiB
Btrfs v3.12

Copying over the data

A multi-device btrfs filesystem doesn't create a single block device like mdadm does. Instead, you can mount any of the member devices, and the filesystem is smart enough to access all of the devices properly. I mounted it and started copying over my data.

[root@z ~]# mount /dev/sdb /mnt
[root@z ~]# rsync -a --stats --info=progress2 /srv/ /mnt/

After the data transfer completed, I unmounted the old array, and remounted the btrfs filesystem at the original mount point.

[root@z ~]# umount /srv
[root@z ~]# umount /mnt
[root@z ~]# mount /dev/sdb /srv

At this point I updated /etc/fstab to remove the old array and add the new filesystem.

[root@z ~]# grep srv /etc/fstab 
#UUID=3720ec23-3309-446e-a80b-c2914c96993d /srv ext4 defaults,noatime 0 2
LABEL=DATA /srv btrfs defaults,noatime 0 0

Destroy the old array

Next I deactivated the mdadm array and wiped the partition tables of the last two disks.

[root@z ~]# mdadm --stop /dev/md0
[root@z ~]# sed -i '/^ARRAY \/dev\/md0/ s_^_#_' /etc/mdadm.conf
[root@z ~]# dd if=/dev/zero of=/dev/sda bs=1M count=50
[root@z ~]# dd if=/dev/zero of=/dev/sdc bs=1M count=50

Add the new devices

Here is what the btrfs filesystem looked like with two disks.

[root@z ~]# btrfs fi show /dev/sdb
Label: DATA  uuid: cc2893b9-4494-4e95-9394-9ec27158e596
        Total devices 2 FS bytes used 3.38TiB
        devid    1 size 2.73TiB used 1.69TiB path /dev/sdb
        devid    2 size 2.73TiB used 1.69TiB path /dev/sdd

Btrfs v3.12

Adding the remaining drives was very simple.

[root@z ~]# btrfs device add /dev/sda /srv
[root@z ~]# btrfs device add /dev/sdc /srv

The new new devices were added immediately, but contain zero data (so far).

[root@z ~]# btrfs fi show /dev/sdb
Label: DATA  uuid: cc2893b9-4494-4e95-9394-9ec27158e596
        Total devices 4 FS bytes used 3.38TiB
        devid    1 size 2.73TiB used 1.69TiB path /dev/sdb
        devid    2 size 2.73TiB used 1.69TiB path /dev/sdd
        devid    3 size 2.73TiB used 0.00 path /dev/sda
        devid    4 size 2.73TiB used 0.00 path /dev/sdc

Btrfs v3.12

Balance to a different raid level

To finalize this migration, I needed to balance things out to a raid-10 layout. I knew that this would take a while, so I started the process in a screen session.

[root@z ~]# btrfs balance start -dconvert=raid10 -mconvert=raid10 /srv

This command blocks the prompt until completion, so I detached from the screen session in order to check the status.

[root@z ~]# btrfs fi show /dev/sda
Label: DATA  uuid: cc2893b9-4494-4e95-9394-9ec27158e596
        Total devices 4 FS bytes used 3.38TiB
        devid    1 size 2.73TiB used 1.70TiB path /dev/sdb
        devid    2 size 2.73TiB used 1.70TiB path /dev/sdd
        devid    3 size 2.73TiB used 17.00GiB path /dev/sda
        devid    4 size 2.73TiB used 17.00GiB path /dev/sdc

Btrfs v3.12
[root@z ~]# btrfs fi df /srv
Data, RAID10: total=34.00GiB, used=29.94GiB
Data, RAID0: total=3.35TiB, used=3.35TiB
Data, single: total=8.00MiB, used=7.00MiB
System, RAID0: total=16.00MiB, used=272.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID0: total=5.00GiB, used=3.89GiB
Metadata, single: total=8.00MiB, used=16.00KiB

Fast forward in time, and the balance appeared complete.

[root@z ~]# btrfs fi show /dev/sda
Label: DATA  uuid: cc2893b9-4494-4e95-9394-9ec27158e596
        Total devices 4 FS bytes used 3.38TiB
        devid    1 size 2.73TiB used 1.70TiB path /dev/sdb
        devid    2 size 2.73TiB used 1.70TiB path /dev/sdd
        devid    3 size 2.73TiB used 1.70TiB path /dev/sda
        devid    4 size 2.73TiB used 1.70TiB path /dev/sdc

Btrfs v3.12
[root@z ~]# btrfs fi df /srv
Data, RAID10: total=3.39TiB, used=3.38TiB
System, RAID10: total=128.00MiB, used=384.00KiB
Metadata, RAID10: total=6.00GiB, used=3.66GiB

I connected back to my screen session to see the final output.

[root@z ~]# btrfs balance start -dconvert=raid10 -mconvert=raid10 /srv
Done, had to relocate 1741 out of 1741 chunks

And there you have it, a full transformation from mdadm to btrfs, without having to remove a single physical drive. The btrfs wiki and the Arch wiki were critical to my research of this project, so I must give them credit.

carlwgeorge/mdadm-to-btrfs.md