Created
June 21, 2013 21:13
-
-
Save bbeaudreault/5834357 to your computer and use it in GitHub Desktop.
Cleanup hadoop jobcache files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
for DIR in `find /mnt/mapred/local/taskTracker/*/jobcache/* -maxdepth 0 -type d -mmin +60`; do | |
if ! find $DIR | grep attempt > /dev/null; then | |
rm -rf $DIR; | |
fi; | |
done; | |
# There is also another bug that results in jobcache directories being duplicated | |
# within the attempt_ directories we filter out above. These directories never go away and so jobs with | |
# this problem will never be cleaned up. The below command handles this case, here's what it does: | |
# | |
# 1. Find all subdirectories below the attempt_ dirs that are over 7 days old, | |
# e.g. /mnt/mapred/local/taskTracker/root/jobcache/job_201305091555_517165/attempt_201305091555_517165_m_000015_0/taskTracker | |
# (We do a 7 day filter because with these we have no way of knowing if the job is still running. This should be safe enough.) | |
# 2. Filter out only those we care about, i.e. those with 2 taskTracker parts in the path | |
# 3. Strip off everything after the jobId, so we are left with only the top-level that we will delete | |
# e.g. /mnt/mapred/local/taskTracker/root/jobcache/job_201305091555_517165/ | |
# 4. We likely have multiple attempt_ dirs per top-level, so uniq them | |
# 5. Recursively remove the results | |
find /mnt/mapred/local/taskTracker/*/jobcache/*/attempt*/ -maxdepth 1 -type d -mtime +7 -name "taskTracker" \ | |
| sed -e 's/attempt.*//' \ | |
| uniq \ | |
| xargs rm -rf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment