Skip to content

Instantly share code, notes, and snippets.

@cstockton
Created May 30, 2016 21:54
Show Gist options
  • Save cstockton/6be455f683e8011904d00b384ea82d90 to your computer and use it in GitHub Desktop.
Save cstockton/6be455f683e8011904d00b384ea82d90 to your computer and use it in GitHub Desktop.
#!/bin/bash
echo "If you're reading this I hope you are in a vm or I probably just saved your ass!"
exit
## Warning:
#
# This script could break the crap out of your machine, it's only for illustration. That
# said- how you manage your data is your business, but if your unsure of best practices I'll
# share the one rule I have that is unwavering:
#
# 1) Never have all your data on one machine in any way shape or form, man. Period.
# - Do not have them physically connected, doing so will expose them to your
# system as block device making them vulnerable to human error.
#
# Of my colleagues, friends, Internet trolls and Internet randoms .. those whom
# have suffered true data loss of value, have always violated this rule. The ways
# you can destroy everything on your machine are numerous. If you are a linux novice
# then you are in my opinion less likely to massacre your drives then me or anyone
# who has over a decade (or decades and decades) of experience. Comfort tends to
# be negligences instigator. I've nuked the shit out of my drives more than once
# and what saved my ass each time, was I didn't compromise on my main rule.
#
#
## Crypto
# A couple notes on crypto, I encrypt my root partition as my front line defense
# from physical access and then protect a second zone of data strictly for protection
# in the case someone illegally gained access to my machine. This second encrypted
# zones provide no protection against a warrant. As our government continues to
# over reach you can be certain that you will not have plausible deniability for
# them. If you have data which needs protected from law enforcement and extortion
# ^ (not mutually exclusive)
# you will need to educate yourself more deeply with luks containers and crypto. That
# is out of scope for some comments in a bash file and people have different positions
# on the subject. Some are passionately against hidden containers but I think they
# can be useful as long as you are methodical and properly implement and use them. Hard
# drive disks I do not believe allow detection of hidden regions of data even in
# a well funded lab setting. I don't know enough about SSD's to say the same for
# them.. they are complex under the hood and I don't know if that complexity
# can leak usage patterns. I doubt it, but you should research if it's a concern.
#
# Hidden volumes are not a direrct feature of luks but the specification is robust
# enough to do so as it supports the luks header being decoupled from the device.
# Which means you may fill a disk with random data and have different regions of
# the device map to a different volume. A few tips to get you a head start:
#
# - Be cautious of your access and the applications you use, be careful not to
# leak information into system logs and such. Do your research for secure data
# access or having secure data is pointless.
# - See luks documentation for --header, --align-payload and --offset for alignment.
# - Make sure to do your research on your chosen FS's block allocation
# characteristics for your workload on the primary partition and after each
# session of writes assert the checksum at your offset matches your backup. i.e.:
# dd if=./luks.img bs=1M seek=<your offset> conv=seek_bytes | sha1sum -b &
# dd if=./luks-backup.img bs=1M seek=<your offset> conv=seek_bytes | sha1sum -b &
# wait
#
#
## Disk Alignment throughout md, lvm, luks
#
# I've left out everything related to alignment to prevent anyone from copy / pasta
# values that would be worst than defaults (which end up pretty good). It's not worth worrying
# about unles you have specific purpose to. The benefits are negligible under normal load.
#
#
## A little bit about what backup directory structure has worked for me.
#
# This isn't how you set of data retention for a business and you should always
# take particular care to know precisely what you are doing. This is just how I
# manage my home.
#
# Each device/machine has their own /storage/ directory. This is where I.. store data. The
# origin of /storage/ isn't important. It may be from my file server or it may be local.
# My workstation has it's own raid array for /storage/, as well as my nuc and anything
# with a keyboard mostly. I have a NFS server for other devices and the very little data
# I need access to from all machines, mostly my password safes and my `one` config dir.
# Those are stored within a luks container within a regular file on the NFS that
# is mounted as a block device via losetup on each machine.
#
# For backups I follow a few conventions but for me simplicity is key. I know
# myself well enough to know that if managing my data responsibly consumes to
# much time, it won't be done.
#
# I have 3 types of backups, I use the words:
# ├─local - Lives on the machine / device
# ├─external - Lives outside of the machine / device
# └─remote - Lives outside of my home
#
# Local backups don't have any requirements, they serve the local machine and so
# I don't follow any rules. They may be lvm snapshots, rsync, etc.
#
# I follow a simple set of conventions which are shared for my external and remote
# backups. They share identical directory structure, but external and remote have
# a different partition of my data. Remote data is only things that I don't want
# to lose in a worst case scenario, which is everything I have except data I can
# recover from other sources. This all fits well within the limits of what can be
# managed via dropbox, google drive, a remote server, etc these days.
#
# The structure is simple, the commands for illustration:
#
# mkdir -p backup-ext/{,{/live,/snap,/remote,/storage,/devices}}
# mkdir -p backup-ext/live/hostname0{1,2,3}
# mkdir -p backup-ext/snap/hostname0{1,2}-date0{1,2}.tgz
# mkdir -p backup-ext/remote/provider01/{mnt,{,luks/{img,bin}}}
# mkdir -p backup-ext/devices/{nexus{5,7},iphone}
#
# The result:
# tree|awk '{print "# " $0}'
# .
# └── backup-ext
# ├── storage
# ├── devices
# │   ├── iphone
# │   ├── nexus5
# │   └── nexus7
# ├── live
# │   ├── hostname01
# │   ├── hostname02
# │   └── hostname03
# ├── remote
# │   └── provider01
# │   ├── luks
# │   │   ├── bin
# │   │   └── img
# │   └── mnt
# └── snap
# ├── hostname01-date01.tgz
# ├── hostname01-date02.tgz
# ├── hostname02-date01.tgz
# └── hostname02-date02.tgz
#
# Storage is my all my data, it's a bit ambigious of a term as each machine has
# local /storage but in this context I refer to it as my one global and organized
# collection of all data. Anything that is important is within this, and it must
# be in order to be backed up external or remote. Where as local machine storage
# may just have some config to run services etc.
#
# Live contains the most up to date system snapshots, it contains everything to
# restore a box including it's system dirs to recover from breaking a machine. It
# includes the /storage directory for the machine. My workstation excludes /storage
# because it is my global /storage and where I manage it.
#
# Snaps are a subset of what is in /live and taken each time I perform a live
# backup. The snapshot varies from device but in general /etc /storage /root
# /home are in a tar file. Sometimes I cp -a live folders to snapshots for some
# reason. It's not a big deal whatever works for my use case.
#
# Devices is just a staging area / live backup. I usually go in there and pluck
# the data out and put it in storage to be immortalized. Pictures of my dog, mostly. :)
#
# Remote I separate by provider, i.e. dropbox is what I use currently as they
# provide remote differential synchronization which is a must for my current
# backup paradigm. How you choose to do this is up to you, but I personally would
# never store unencrypted data on dropbox or any service provider. Seriously. It's
# easy enough to encrypt your data via luks and it's usable across all platforms
# using virtualbox or docker. I have a pair of dockerfiles if anyone has any
# interest I can share them.
#
# All I do is create an initial set of files for my data in chunks. These are
# just regular blobs of urandom data. I then set them up as block devices using
# losetup. Then use the linux device mapper to map them into a single block
# device suitable for luks formatting. You don't need great performance because
# it's out of band but I will take a moment to commend the device mapper as having
# testing hundreds of mapped files the cost is very low. This means you can split
# your remote data into small chunks depending on the amount of data you have
# and the synchronization process is much less painful for any endpoint which
# supports differential sync. It's also nice because you may grow the storage
# anytime by simply adding some more files. This means you can start at 80%
# capacity today without having to stress about growing later.
#
# Also for formatting the block allocation for ext4 fits the patterns for
# this and it's what I suggest using. I use a second key file on my storage
# device to encourage more regular backups through automation. Just don't forget
# to have a primary keyslot that is a mental secret. I hope you never have a
# life event that causes you to use your remote backup, but if you do it's not
# likely that you no longer have access to your storage key.
#
# A slightly modified example script from my dockerfile I use for mounting that
# would work for testing this if you were curious. I already set it up to provide
# a 10GB image composed of 100 100MB files.
#
# LOOP_LABEL=backup_remote
# LOOP_DEVICES=$(losetup -anO NAME,BACK-FILE)
# CUR_SECTOR=0F
#
# { for PART_NUM in $(seq -f "%02g" 0 99)
# do
# IMG_FILE="./${LOOP_LABEL}.img${PART_NUM}"
# IMG_DEVICE=$(echo "${LOOP_DEVICES}"|grep "${IMG_FILE}"|awk '{print $1}')
#
# # You should exit instead of fallocate if not testing, also if using it for
# # actual backups doesn't hurt to create the files from urandom. Though it's not
# # really theoretically.. it just feels better.
# [ -f "${IMG_FILE}" ] || fallocate -l 100M "${IMG_FILE}"
# [ -n "${IMG_DEVICE}" ] || {
# IMG_DEVICE=$(losetup --show -f "${IMG_FILE}")
# }
# CUR_SECTOR_SIZE=$(blockdev --getsz "${IMG_DEVICE}")
# echo "${CUR_SECTOR} ${CUR_SECTOR_SIZE} linear ${IMG_DEVICE} 0"
#
# CUR_SECTOR=$((CUR_SECTOR+$CUR_SECTOR_SIZE))
# done } | dmsetup create "${LOOP_LABEL}"
#
# [ -f "/dev/mapper/${LOOP_LABEL}_crypt" ] || cryptsetup luksOpen --key-file ${LOOP_LABEL}.key /dev/mapper/${LOOP_LABEL} ${LOOP_LABEL}_crypt
# [ -d "/your/path/backup-ext/dropbox/mnt" ] || mount "/dev/mapper/${LOOP_LABEL}_crypt" "/your/path/backup-ext/dropbox/mnt"
# rsync -a /your/ /stuff/
#
#
# Closing is easy:
# umount "/your/path/backup-ext/dropbox/mnt"
# cryptsetup luksClose "${LOOP_LABEL}_crypt"
# dmsetup remove "${LOOP_LABEL}_crypt"
#
# for x in `losetup -al| grep "${LOOP_LABEL}.img"|awk '{print $1}'`; do
# losetup -d $x
# done
#
# Below is my provision script, slightly edited and I wouldn't run it.. it's just
# to share how I configure my machine.
#
DISK_BLKS="${@:-$(lsblk -dpo NAME,SIZE|grep /dev/sd|grep 1.8T|awk '{print $1}'|xargs echo -n)}"
echo "================"
echo "Confirm:"
echo " Disks to be used in raid arrays: ${DISK_BLKS}"
echo
echo "OK? Y/N"
while read -r -n 1 -s CONFIRM; do
if [[ $CONFIRM = [YyNn] ]]; then
[[ $CONFIRM = [Nn] ]] && echo "Exiting.." && exit 1
break
fi
done
# Clear out all old raids
mdadm --write-mostly --zero-superblock $(for x in `echo $DISK_BLKS`; do echo "${x}*"; done) || {
printf >&2 "[error] %s: could not zero superblocks for %s\n" "${0}" "${DISK_BLKS}"
exit 1
}
wipefs --all --force $(for x in `echo $DISK_BLKS`; do echo "${x}*"; done)
# Create disks
for x in $(echo $DISK_BLKS); do
sfdisk "${x}" <<SFDISK-CONFIG
label: gpt
device: /dev/sd${x}
unit: sectors
first-lba: 2048
last-lba: 3907029134
/dev/sd${x}1 : start=2048, size=3907027087, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, name="ssd${x}p0"
SFDISK-CONFIG
done
# Create six disk raid 10 array, only change is layout = far, see man page md for details
mdadm --verbose --create /dev/md/storage --assume-clean --level=10 --layout f2 \
--raid-devices=6 $(for x in `echo $DISK_BLKS`; do echo "${x}1"; done) || {
printf >&2 "[error] %s: could not create raid 10 array with %s\n" "${0}" "${DISK_BLKS}"
exit 1
}
# I have redundancy, bitmaps aren't needed for me
mdadm --grow --bitmap=none /dev/md/storage
# Update conf
grep --quiet /dev/md/storage /etc/mdadm/mdadm.conf || mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# Luks format
cryptsetup luksFormat --verbose --verify-passphrase --key-size=512 --align-payload=1024 /dev/md/storage || {
printf >&2 "[error] %s: could not luks format, bad pass phrase?\n" "${0}"
exit 1
}
# Make luks mount key
[ -f "/root/md-storage-key" ] || dd if=/dev/urandom of=/root/md-storage-key bs=1024 count=4
# Add the key for local mounting
cryptsetup luksAddKey /dev/md/storage /root/md-storage-key || {
printf >&2 "[error] %s: luksAddKey failed\n" "${0}"
exit 1
}
# Backup headers in root (this is for illustration don't lose these, seriously)
[ -f "/root/md-storage-header" ] || cryptsetup luksHeaderBackup --header-backup-file /root/md-storage-header /dev/md/storage
# Open luks container
cryptsetup luksOpen --key-file /root/md-storage-key /dev/md/storage storage_crypt || {
printf >&2 "[error] %s: luksOpen failed\n" "${0}"
exit 1
}
# We have the following `lsblk|awk '{print "# " $0}'` with raid + luks setup
# ----
# sd* 8:80 0 1.8T 0 disk
# └─sd** 8:81 0 1.8T 0 part
# └─storage 9:127 0 5.5T 0 raid10
# └─storage_crypt 252:5 0 5.5T 0 crypt
# Create lvm physical volume on top of luks container
pvcreate /dev/mapper/storage_crypt
# Create main volume group
vgcreate storage /dev/mapper/storage_crypt
# Create logical volumes
# - Unencrypted and auto mounted at boot (physical volume is remember)
# - base - general storage that is not categorized, serves as root mount point
# - one - my main workstation folder, dev, scripts, workstation stuff
# - media - non-personal media collection, music, movies, etc, not backuped remotely
# - backup - contains.. wait for it.. backups!
lvcreate --size 100G --name base storage
lvcreate -L 500G -n one storage
lvcreate -L 1T -n media storage
lvcreate -L 1T -n remote storage
# - Encrypted and not auto mounted
# - chris - all my personal stuff, financial data, pictures of friends, familly, etc
# - vault - data that I don't want to lose, but don't need regular access to.
# - archive - where data goes to die, a recycle bin that is not often recycled.
#
lvcreate -L 300G -n chris storage
lvcreate -L 500G -n vault storage
lvcreate -L 2T -n archive storage
# Now `lsblk|awk '{print "# " $0}'` with our volumes:
# ----
# sd* 8:80 0 1.8T 0 disk
# └─sd** 8:81 0 1.8T 0 part
# └─md*** 9:127 0 5.5T 0 raid10
# └─storage_crypt 252:5 0 5.5T 0 crypt
# ├─storage-one 252:6 0 500G 0 lvm
# ├─storage-media 252:7 0 1T 0 lvm
# ├─storage-chris 252:8 0 300G 0 lvm
# ├─storage-vault 252:9 0 500G 0 lvm
# ├─storage-archive 252:10 0 2T 0 lvm
# └─storage-base 252:11 0 100G 0 lvm
# Format the unencrypted logical volumes
# -q --> quiet
mkfs.ext4 -q /dev/storage/base
mkfs.ext4 -q /dev/storage/one
mkfs.ext4 -q /dev/storage/media
mkfs.ext4 -q /dev/storage/remote
# Create the mount points to prep for luks, -p creates parents if they don't exist
mkdir --parents /one /storage/{one,media,remote}
mkdir -p /storage/{chris,vault,archive}
# Mount our public fs's to start syncing data from backups etc
mount /dev/storage/base /storage
mount /dev/storage/one /storage/one
mount /dev/storage/media /storage/media
mount /dev/storage/remote /storage/remote
# My one folder I like to bind to bind at root, this is because some programs (atom ide for example)
# seem to behave poorly with symlinked project roots, likely due fs notify and such. I prefer this
# because it simplifys backups tremendously, i.e. rsync /storage/ /backup/
mount -t none -o bind /dev/storage/one /one
# Now `df -h` with our volumes mounted:
# ----
# /dev/mapper/storage-base 99G 60M 94G 1% /storage
# /dev/mapper/storage-remote 1008G 72M 957G 1% /storage/remote
# /dev/mapper/storage-media 1008G 72M 957G 1% /storage/media
# /dev/mapper/storage-one 493G 2.4G 468G 1% /one
# Luks format storage: chris
cryptsetup luksFormat -vy -s 512 /dev/storage/chris || {
printf >&2 "[error] %s: could not luks format, bad pass phrase?\n" "${0}"
exit 1
}
# Luks format storage: vault
cryptsetup luksFormat -vy -s 512 /dev/storage/vault || {
printf >&2 "[error] %s: could not luks format, bad pass phrase?\n" "${0}"
exit 1
}
# Luks format storage: archive
cryptsetup luksFormat -vys 512 /dev/storage/archive || {
printf >&2 "[error] %s: could not luks format, bad pass phrase?\n" "${0}"
exit 1
}
# Handle encrypted vols
for vol_short in chris archive vault; do
backup_file="/root/lvm-storage-${vol_short}-header"
key_file="/root/lvm-storage-${vol_short}-key"
vol_name="/dev/storage/${vol_short}"
crypt_name="storage_${vol_short}_crypt"
[ -f "${key_file}" ] || dd if=/dev/urandom of="${key_file}" bs=1024 count=4
cryptsetup luksDump "${vol_name}" | grep -qs "Slot 1: ENABLED" || {
cryptsetup luksAddKey "${vol_name}" "${key_file}" || {
printf >&2 "[error] %s: luksAddKey '%s' '%s' failed\n" "${0}" "${vol_name}" "${key_file}"
exit 1
}
}
[ -f "${backup_file}" ] || \
cryptsetup luksHeaderBackup --header-backup-file "${backup_file}" "${vol_name}"
cryptsetup status "${crypt_name}" > /dev/null 2>&1 || {
cryptsetup luksOpen -d "${key_file}" "${vol_name}" "${crypt_name}" || {
printf >&2 "[error] %s: luksOpen -d '%s' '%s' failed\n" "${0}" "${vol_name}" "${key_file}"
exit 1
}
}
done
# Format and mount the encrypted logical volumes
for vol_short in chris archive vault; do
mkfs.ext4 -q "/dev/mapper/storage_${vol_short}_crypt" && \
mount "/dev/mapper/storage_${vol_short}_crypt" "/storage/${vol_short}"
done
# Now `df -h` with our volumes mounted:
# ----
# /dev/mapper/storage-base 99G 60M 94G 1% /storage
# /dev/mapper/storage-one 493G 2.4G 468G 1% /one
# /dev/mapper/storage-media 1008G 557G 400G 59% /storage/media
# /dev/mapper/storage-remote 1008G 72M 957G 1% /storage/remote
# /dev/mapper/storage_chris_crypt 296G 63M 281G 1% /storage/chris
# /dev/mapper/storage_archive_crypt 2.0T 71M 1.9T 1% /storage/archive
# /dev/mapper/storage_vault_crypt 493G 70M 467G 1% /storage/vault
# Now `lsblk|awk '{print "# " $0}'` with our volumes:
# ----
# sd* 8:80 0 1.8T 0 disk
# └─sd** 8:81 0 1.8T 0 part
# └─storage 9:127 0 5.5T 0 raid10
# └─storage_crypt 252:5 0 5.5T 0 crypt
# ├─storage-one 252:7 0 500G 0 lvm /one
# ├─storage-remote 252:8 0 1T 0 lvm /storage/remote
# ├─storage-media 252:9 0 1T 0 lvm /storage/media
# ├─storage-chris 252:10 0 300G 0 lvm
# │ └─storage_chris_crypt 252:12 0 300G 0 crypt /storage/chris
# ├─storage-vault 252:11 0 500G 0 lvm
# │ └─storage_vault_crypt 252:16 0 500G 0 crypt /storage/vault
# ├─storage-base 252:13 0 100G 0 lvm /storage
# └─storage-archive 252:14 0 2T 0 lvm
# └─storage_archive_crypt 252:15 0 2T 0 crypt /storage/archive
# Is it slow having a few levels of abstraction before your IO? Not really.
# $ time dd if=/dev/zero of=./8gb.img bs=1M count=8192 conv=fdatasync
# > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 11.0654 s, 776 MB/s
# $ sync && echo 3 > /proc/sys/vm/drop_caches && free -h
# > total used free shared buff/cache available
# > Mem: 251G 1.7G 249G 80M 470M 249G
# > Swap: 6.4G 0B 6.4G
# $ time dd if=./8gb.img of=/dev/null bs=1M
# > 8192+0 records in
# > 8192+0 records out
# > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 3.26732 s, 2.6 GB/s
# >
# > real 0m3.270s
# > user 0m0.016s
# > sys 0m2.248s
#
# Your mileage will vary of course, I'm running a supermicro X10DRH-C with 6 ssd's
# in JBOD via lsi 3108 roc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment