This is a guide to set up ZFS push replication to an unprivileged user account.
Please read the content for more information.
This is a guide to set up ZFS push replication to an unprivileged user account.
Please read the content for more information.
ARCHIVED: This was definitely a learning experience. Not using sync snaps unfortunately breaks very easily, if either the receiving or sending host, or the link between them, goes down for extended periods. For backups between servers that is not an issue, for desktop/laptop PCs, and residential home servers and PIs it is quite a problem and required repeated manual intervention.
While I am still using sanoid
and syncoid
, I now use a different linux distro (NixOS) and sync snaps, but still in a multi-level, append-only push configuration.
This guide can be used to replicate the ZFS filesystems of one or more source hosts to one or more backup target hosts.
The sources maintain their own sets of periodic snapshots using sanoid
. Those are then replicated to an essentially write-only user on the targets using syncoid
, where again sanoid
is used to limit the number of old snaps kept.
This allows to keep more frequent snaps on the source systems, than on the remote targets, and limits the external access to the targets to receive only.
While the system should work completely automated once set up, it does require a number of manual steps to get there. Usually, a more automated setup would be preferable, but given the variable number of, and potential variation in, the hosts involved in a backup system, that seems like more effort than it's worth in this case.
This guide assumes that all involved hosts run Ubuntu 20.04. With minor modifications, it should also run on any other system that is supported by (Open)ZFS and Sanoid/Syncoid, just make sure to actually understand all the instructions in that case.
On all involved machines (source and target) involved run:
add-apt-repository universe && apt-get install --yes sanoid
mkdir -p /etc/sanoid && touch /etc/sanoid/sanoid.conf
Which, besides installing the binaries, creates sanoid.timer
, which triggers (every 15 minutes) sanoid.service
, after which sanoid-prune.service
runs.
The latter two will then run sanoid --take-snapshots
and sanoid --prune-snapshots
respectively (which won't do anything while /etc/sanoid/sanoid.conf
is empty).
NOTE: At the time of writing, the pull request adding the --accept-snap
to Syncoid is still pending, so sanoid
has to be installed from source instead:
# { (. <(cat << "#EOF" # copy from after the first #
set -ex
cd /tmp/ # TODO: don't do this in cwd
apt install --yes build-essential dpkg-dev debhelper
git clone https://github.com/NiklasGollenstede/sanoid.git && cd sanoid
git checkout syncoid-fiter-sync-snap && ln -s packages/debian .
dpkg-buildpackage -uc -us && apt install --yes --reinstall --allow-downgrades ../sanoid_*_all.deb
apt-mark hold sanoid
mkdir -p /etc/sanoid && touch /etc/sanoid/sanoid.conf
systemctl enable sanoid.timer; systemctl enable sanoid.service; systemctl enable sanoid-prune.service;
systemctl start sanoid.timer; systemctl start sanoid.service; systemctl start sanoid-prune.service;
#EOF
)); }
The first step is to maintain a set of reasonably recent and old snapshots on the source (assuming it is the system actually producing new data).
After installing sanoid
, simply editor /etc/sanoid/sanoid.conf
, which could look something like this:
## file systems
# TODO: remove any that are not present on the system!
[bpool]
# boot pool
use_template = production
recursive = yes
[rpool]
# system pool
use_template = production
recursive = yes
[rpool/system/var/lib/docker]
# exclude docker
use_template = disabled
recursive = yes
[app]
# app pool, if not within rpool
use_template = production
recursive = yes
## templates
[template_disabled]
autoprune = no
autosnap = 0
monitor = no
[template_production]
# make "frequently" = 15 minutes:
frequent_period = 15
frequently = 2
hourly = 36
daily = 21
weekly = 6
monthly = 3
# yearly for manual deletion:
yearly = 100
# for consistency, take each snap at the beginning of the respective period
# months and days are apparently 1-indexed, while hours and mins are 0-indexed
hourly_min = 0
daily_hour = 0
daily_min = 0
weekly_wday = 1
weekly_hour = 0
weekly_min = 0
monthly_mday = 1
monthly_hour = 0
monthly_min = 0
yearly_mon = 1
yearly_mday = 1
yearly_hour = 0
yearly_min = 0
To test the config before it runs automatically, call sanoid --cron --force-update --verbose
.
The replication is done trough a systemd oneshot service triggered by a daily timer.
Set source=
to a short (alphanumeric) name for the host to be backed up, datasets=()
to the datasets to be backed up (as space separated list in parenthesis), and for every target=
as SSH hostname (optionally with targetDataset=
), run:
# { (. <(cat << "#EOF" # copy from after the first #
set -eux
SYNCOID_OPTS='--recursive --no-sync-snap --create-bookmark --accept-snap=^autosnap_[0-9:_-]+_(?:daily|weekly|monthly|yearly)$ --no-rollback --no-privilege-elevation --sshkey /root/.ssh/syncoid_ed25519' # avoid backslashes '\'
SYNCOID_SEND_OPTS='w' # --raw: send as encrypted
SYNCOID_RECV_OPTS='u o canmount=off' # don't mount now, don't mount later
if [[ ! -e /root/.ssh/syncoid_ed25519 ]]; then
ssh-keygen -N '' -f /root/.ssh/syncoid_ed25519 -q -t ed25519 -C "zfs backup from root@$(hostname)"
fi
# Type=oneshot and Restart=on-failure don't work together in older versions of systemd
restart=''; if (($(systemd --version | grep -Po 'systemd \K\d+') < 245)); then restart='#'; fi
cat << EOC > /etc/systemd/system/syncoid@${target}.timer
[Unit]
Description=syncoid daily timer to ${target}
Requires=syncoid@${target}.service
[Timer]
OnCalendar=00:15 UTC
Persistent=true
[Install]
WantedBy=timers.target
EOC
cat << EOC > /etc/systemd/system/syncoid@${target}.service
[Unit]
Description=syncoid ${datasets[@]} to zfs-from-${source}@${target}:${targetDataset:-backup}/${source}/
Requires=zfs.target
After=zfs.target
[Install]
WantedBy=syncoid.target
[Service]
Environment=TZ=UTC
Environment='SYNCOID_OPTS=${SYNCOID_OPTS}'
Environment='SYNCOID_SEND_OPTS=${SYNCOID_SEND_OPTS}'
Environment='SYNCOID_RECV_OPTS=${SYNCOID_RECV_OPTS}'
Type=oneshot
${restart}Restart=on-failure
${restart}RestartSec=120
EOC
for dataset in "${datasets[@]}"; do
printf '%s\n' "ExecStart=$(which syncoid) \$SYNCOID_OPTS --sendoptions=\${SYNCOID_SEND_OPTS} --recvoptions=\${SYNCOID_RECV_OPTS} ${dataset} zfs-from-${source}@${target}:${targetDataset:-backup}/${source}/${dataset}" >> /etc/systemd/system/syncoid@${target}.service
done
systemctl daemon-reload
systemctl enable syncoid@${target}.service
systemctl enable syncoid@${target}.timer
systemctl start syncoid@${target}.timer
set +x; echo 'Try to manually run this (for the first dataset)':
printf '# %s\n' "( SYNCOID_OPTS='${SYNCOID_OPTS}'; SYNCOID_SEND_OPTS='${SYNCOID_SEND_OPTS}'; SYNCOID_RECV_OPTS='${SYNCOID_RECV_OPTS}'; $(which syncoid) \$SYNCOID_OPTS --sendoptions=\"\${SYNCOID_SEND_OPTS}\" --recvoptions=\"\${SYNCOID_RECV_OPTS}\" ${datasets} zfs-from-${source}@${target}:${targetDataset:-backup}/${source}/${datasets} --debug )"
#EOF
)); }
The service will fail until that target is actually set up for receiving.
Depending on the format of target
, it may be a necessary to editor /root/.ssh/config
and add something like Host ${target}\n HostName <ip>
.
Note: Older versions of systemd don't support restarting oneshot services, e.g. on Ubuntu before 20.04. This therefore disables restarting/reattempting failed syncs, which is generally a quite bad.
Unless one already exists, create a pool (referred to by backupDataset=backup
) on each target system:
apt install zfsutils-linux # if ZFS is not installed already
zpool create -o ashift=12 \
-O acltype=posixacl -O compression=lz4 \
-O dnodesize=auto -O normalization=formD -O relatime=on -O xattr=sa \
-O mountpoint=none -O canmount=off \
${backupDataset:-backup} /dev/... # TODO: specify data files/blocks
Alternatively, create backupDataset
in an existing pool:
zfs create \
-o mountpoint=none -o canmount=off \
${backupDataset:-rpool/backup}
For any combination of source and target, on the target, set the configuration:
publicKey='' # TODO: contents of `cat /root/.ssh/syncoid_ed25519.pub` on source
source= # TODO: same name for source as used in the push source definition
userName=zfs-from-${source}
backupDataset=backup # TODO: same as »targetDataset« set on source
And run this to set up an unprivileged system user and a filesystem in the backupDataset
pool to receive the syncs:
# { (. <(cat << "#EOF" # copy from after the first #
set -eux
# add user
if ! id -u "${userName}" ; then
echo "creating user ${userName} on gateway"
# this user needs to be available for login, but it should not have a shell
adduser --system "${userName}" --gecos '' --group --disabled-password --shell /bin/sh
fi
# add user's sshd_config
configFile=/etc/ssh/sshd_config.d/user-${userName}.conf
echo "adding/replacing configuration for ${userName} in ${configFile}"
printf '%s' "Match User ${userName}
GatewayPorts no
PermitListen none
PermitOpen 83.35.163.94:36937 # random
AllowTcpForwarding no
AllowAgentForwarding no
X11Forwarding no
PermitTunnel no
#ForceCommand # TODO: see note under 'Not Implemented' below
" > ${configFile} # for older versions of SSHd, which don't support `Include` yet, this needs to be _appended_ to `/etc/ssh/sshd_config` instead (but `/etc/ssh/sshd_config.d` likely won't exist then at all)
systemctl reload ssh
# add public key for user
mkdir -p /home/${userName}/.ssh/ && chown ${userName}: /home/${userName}/.ssh && chmod 700 /home/${userName}/.ssh
touch /home/${userName}/.ssh/authorized_keys && chown ${userName}: /home/${userName}/.ssh/authorized_keys && chmod 600 /home/${userName}/.ssh/authorized_keys
echo "adding public key to /home/${userName}/.ssh/authorized_keys"
printf '%s\n' "${publicKey}" >> /home/${userName}/.ssh/authorized_keys
# give user ZFS permissions
zfs create -o mountpoint=none -o canmount=off ${backupDataset}/${source}
zfs allow -u "${userName}" create,receive,mount,canmount,userprop ${backupDataset}/${source} # `mount` is required for `receive`
#EOF
)); }
To prune the snapshots received from the source, use sanoid
on the target as well, but disable autosnap
.
With the backup
pool created editor /etc/sanoid/sanoid.conf
, which could look something like this:
## file systems
[backup]
use_template = backup
recursive = yes
## templates
[template_backup]
# do NOT create any snapshots here, but receive them from the source
autosnap = 0
frequently = 0
hourly = 0
daily = 21
weekly = 6
monthly = 6
# yearly for manual deletion:
yearly = 100
*_daily
snaps, or does it also check that they are actually older than 90 days before deleting them? If not the latter, then an attacker could push lots of new empty snaps to have the actual backup snaps pruned out, thus deleting the data.syncoid
gets stuck, leaving the service "activating" indefinitely -> add startup timeout.One probably very much wants to exclude dockers ZFS layer, image and anonymous volume store from the backups (if the source uses docker with ZFS). Each image layer is saved as its own ZFS file system, which would get replicated, but then not removed from the targets once it is (docker) pruned from the source host. The images (and anonymous volumes) should be ephemeral/reproducible already.
Use e.g.: zfs set syncoid:sync=false .../var/lib/docker
.
A system that receives datasets and which it is supposed to send further, the send should (also) happen after the receive, or the received updates may only be pushed on the nezt send, the next day.
syncoid
sends each dataset individually, breaking (at least) encryptionroot
inheritance. To fix this after restoring, run for each actual encryptionroot
:
zfs list -rHo name ${encryptionroot} | tail -n +2 | xargs -n1 zfs change-key -i -l
.
For the above to work, each dataset must have its key loadable or loaded. That works automatically with keyfiles, but has to be done manually for passwords, and even with load-key -r
ZFS prompts again for each dataset. The most convenient solution I have found is: ( while true; do cat <<< "$password"; done ) | zfs load-key -r rpool/USERDATA/private_zl2i8e
and Ctrl+C that once it says "X/Y keys loaded".
When restoring, ZFS send's --backup
option can be used to drop the canmount=off
again.
Could look into this: sanoid --monitor-snapshots
: "This option is designed to be run by a Nagios monitoring system. It reports on the health of your snapshots."
syncoid
could build the receive command on the remote host, so that the receiving user's sshd_config
ForceCommand
can be set to $(which syncoid)
.sanoid
could prune snaps it did not create itself (esp. compat with zsys
). E.g. every foreign snap older than XXX could be converted to a bookmark.