Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hvrauhal/b949d32c5edf856639da40374db695f8 to your computer and use it in GitHub Desktop.
Save hvrauhal/b949d32c5edf856639da40374db695f8 to your computer and use it in GitHub Desktop.
Transferring 1.7TB of data from an HFS+ USB2 drive to Windows 10 NTFS

Bulk File Transfer From HFS+ usb drive to Windows 10 NTFS disk

Why not just copy via SMB?

First I tried to just copy the contents from an SMB share running on MacOS side. That was super slow, and considering also the impossibility of resuming, or knowing what actually got transferred, I did not think that would be the route to go.

Next I tried to naively rsync over an SMB mounted windows drive from the MacOS side, but the problem is that the local diff algorithm does not work over network, as all the data is transferred to the same process for comparison, and even skipping the rsync algorithm, the transfer speed was really low.

I have previously fought with ssh to get it to transfer files at full speed and decided to skip that battle this time.

rsync protocol to the rescue

So let's go with the native rsync protocol, as it is well suited for transferring files over LAN.

Windows 10 side

I ended up doing the transfer by setting up rsync daemon on Ubuntu 18 running on WSL 1. The rsyncd.conf looks like this:

secrets file = /home/username/rsyncd/secrets
log file = /home/username/rsyncd/rsyncd.log
port = 8730
use chroot = false
read only = false
charset = utf-8

[Destination]
  path = /mnt/d/Destination
  comment = Contents of Destination
  auth users = rsyncusername

... while secrets looks like this:

rsyncusername:rsyncpassword

and then in the wsl shell I ended up running rsync in daemon mode with:

rsync --daemon --config=/home/username/rsyncd/rsyncd.conf --no-detach

I'm not sure what it was that was needed to speed things up, but in Virus & threat protection settings I ended up disabling Real-time protection temporarily and also setting up Exclusions with process names for the rsync process that handles the network, and also for the unnamed rsync related processes that only have some numbers and characters in their name, but are easy to identify in Task Manager as they write the same data to disk that rsync process gets from the network. Finally, just to be sure, I set an exclusion for the Destination folder.

Without the exclusions, things would really slow down (to 6MBps or so) after some data had been transferred, and I could not identify other reasons for the slowdown.

MacOS side

First Installed rsync from Homebrew to get version 3.2.3 instead of the 2.something that ships with MacOS to get iconv support to have proper utf-8 on both sides.

The final version of the transfer command is:

rsync --archive --verbose --delete --inplace --no-compress \
  --whole-file --info=progress2 --iconv=utf-8-mac,utf-8 \
  --exclude=".DS_Store" \
  /Volumes/Source/ rsync://[email protected].:8730/Destination

Now it's happily churning away at 25% of 1.7TB done at 40MB/s. That speed is close enough to the theoretical maximum of the USB 2 connection that my Source volume is connected with. This transfer is also easy and fast to resume.

Problems

Still, after some time of transfers, the transfer speed goes down to 5MB/s, but resumes if the client side is stopped and resumed. The windows system process consumes a complete single core of CPU, but I have not yet found out why.

Reading the USB disk is not the bottleneck when that happens. Reading with another process when transfer has slowed down successfully consumes all the USB bandwidth on the MacOS side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment