Skip to content

Instantly share code, notes, and snippets.

@jeongho
Last active November 2, 2021 22:55
Show Gist options
  • Save jeongho/2149af8ed55f1b45550f to your computer and use it in GitHub Desktop.
Save jeongho/2149af8ed55f1b45550f to your computer and use it in GitHub Desktop.
hdfs tmp folder cleanup
#!/usr/bin/env bash
#remove files older than X days:
#based off the hadoop fs -ls
#days=5; for f in $(cutoff=$(echo $(date +%s)"-$days*24*60*60" | bc); hadoop fs -ls -R /tmp 2>/dev/null|grep ^- |awk '{ print "echo $(date -d \""$6,$7"\" +%s)" , $8}'| bash | awk -v cutoff=$cutoff '$1 < cutoff'| sort -n | cut -f2 -d" "|grep ^$d); do hadoop fs -rm $f; done
#remove files older than X days:
days=5;
for f in $(cutoff=$(echo $(date +%s)"-$days*24*60*60" | bc);
hadoop fs -ls -R /tmp 2>/dev/null | grep ^- | \
awk '{ print "echo $(date -d \""$6,$7"\" +%s)" , $8}' | bash | \
awk -v cutoff=$cutoff '$1 < cutoff' | \
sort -n | \
cut -f2 -d" " | \
grep ^$d);
do
sudo -u hdfs hadoop fs -rm -skipTrash $f;
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment