Skip to content

Instantly share code, notes, and snippets.

@seanorama
Last active November 20, 2018 11:11
Show Gist options
  • Save seanorama/a7af7de4387586890368c1050fabd9d2 to your computer and use it in GitHub Desktop.
Save seanorama/a7af7de4387586890368c1050fabd9d2 to your computer and use it in GitHub Desktop.
partitioning

Partitioning Hadoop disks.

This will vary greatly depending on the host and disk layout

Considerations:

  • We mount hadoop data disks at /hadoopfs/fs1, fs2, ...
    • We chattr +i any mount dirs, before mounting, such that / won't fill in the case of a mount failing.
  • When there is not a dedicated Hadoop disk, we create /var/lib/hadoop and symlink it to /hadoop.
  • To make sure nothing ends up on / we symlink /hadoop

Edge and Masters

If there are no disks dedicated to Hadoop, then use /var

sudo mkdir /var/lib/hadoop
sudo ln -s /var/lib/hadoop /hadoop
sudo mkdir /hadoopfs
sudo ln -s /var/lib/hadoop /hadoopfs/fs1

Workers

With disk(s) dedicated to Hadoop

## Hadoop disk is /dev/nvme1n1
disk=/dev/nvme1n1

####
## create file system and mount
mkfs_opts="-s size=$(sudo blockdev --getbsz ${disk})"
sudo mkfs.xfs ${mkfs_opts} ${disk} -f

sudo mkdir -p /hadoopfs/fs1
sudo chmod 0000 /hadoopfs/fs1
sudo chattr +i /hadoopfs/fs1

echo "${disk} /hadoopfs/fs1 xfs defaults,noatime 0 0" | sudo tee -a /etc/fstab >/dev/null
sudo mount /hadoopfs/fs1
####

## Repeat for additional disks, at /hadoopfs/fs2, fs3, ...

## Symlinks to catch anything that might slip through
sudo ln -s /hadoopfs/fs1 /hadoop
sudo ln -s /hadoopfs/fs1 /var/lib/hadoop

Move /var and /var/log to their own partitions in an already mounted system

Required in AWS environments.

  • Move /var and /var/log to their own partitions
sudo su - 

yum -y install psmisc tmux lsof lvm2 xfsprogs

## --------------------------------------------
## Create LVM

## Confirm the disk to use in the output of: 
lsblk

## Choose the disk for the PV
disk=/dev/xvdj
#disk=/dev/nvme1n1

## Create the PV
sudo pvcreate ${disk}

## Choose name for VG
vg="vg_data01"

## Create the VG
sudo vgcreate ${vg} ${disk}

## Create LVs
lvcreate -n lv_var_log  -l 45%VG ${vg}
lvcreate -n lv_var      -l 50%VG ${vg}

## make file systems
mkfs -t xfs /dev/${vg}/lv_var
mkfs -t xfs /dev/${vg}/lv_var_log

## prepare temporary dirs
mkdir -p /mnt/tmp-var
mount /dev/${vg}/lv_var /mnt/tmp-var

mkdir -p /mnt/tmp-var-log
mount /dev/${vg}/lv_var_log /mnt/tmp-var-log

## get list of anything that's open in the dirs
lsof / | grep /var | awk '{print $1}' | sort | uniq -c

## manually stop any services that have files open in /var
##   - example below
services="amazon-ssm-agent awslogs ds_agent monit postfix tuned codedeploy-agent rsyslog lwsmd sshd auditd NetworkManager"
for service in ${services}; do
  systemctl stop ${service} \&
done

auditctl -e0
killall auditd
killall dhclient

## only continue if nothing is using the disk or things that won't matter, such as dhclient lease

## READ ABOVE!!!

## move files to temporary location in new mount points
shopt -s dotglob
mv /var/log/* /mnt/tmp-var-log/
mv /var/* /mnt/tmp-var/

mkdir /var/log

## this will prevent applications writing to the parent partition if a mount is unavailable
chmod 0000 /var/log
chattr +i /var/log
chmod 0000 /var
chattr +i /var

## remount new var to /var
umount /mnt/tmp-var
echo "/dev/${vg}/lv_var    /var xfs defaults 0 0" | sudo tee -a /etc/fstab >/dev/null
mount /var

chmod 0000 /var/log
chattr +i /var/log

## remount new log to /var/log
umount /mnt/tmp-var-log
echo "/dev/${vg}/lv_var_log    /var/log xfs defaults 0 0" | sudo tee -a /etc/fstab >/dev/null
mount /var/log

## confirm the mounts
df -h -t xfs

## restart services
for service in ${services}; do
  systemctl start ${service}
done

## Reboot to ensure all services are started up properly
#shutdown -r +1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment