Skip to content

Instantly share code, notes, and snippets.

Created March 23, 2019 00:43
Show Gist options
  • Save masbicudo/c87600d08ba32903b0e0863efd0966a8 to your computer and use it in GitHub Desktop.
Save masbicudo/c87600d08ba32903b0e0863efd0966a8 to your computer and use it in GitHub Desktop.
Script to find large files in git history
# This code is based on the awesome answer by @torek from StackOverflow:
# I have only made a shell for his code, added some options, added some colors
# and voilà!
# This script can be used to find large files inside a git repository
# and it's whole history. It will list files larger than a given threshold,
# and display these files in a colored human readable way.
# usage examples:
# - find files larger than 10MB:
# ./ -sz 10MB
# - show one file per line:
# ./ -sz 10kb -sl
BIG=100KB # 100KB file
while [[ $# -gt 0 ]]
case $key in
-sl|--single-line) SINGLE_LINE=TRUE; shift;;
-sz|--size) BIG=$2; shift; shift;;
*) shift;;
if [[ "$BIG" =~ ^[0-9]+[KMGTEkmgte][Bb]$ ]]; then
LETTER=$(echo $BIG | sed -r 's ^[0-9]+([KMGTE])B$ \1 gI')
BIG=$(echo $BIG | sed -r 's ^([0-9]+)[KMGTE]B$ \1 gI')
if [ ${LETTER^^} = "K" ]; then let "BIG=$BIG*1024"; fi
if [ ${LETTER^^} = "M" ]; then let "BIG=$BIG*1024000"; fi
if [ ${LETTER^^} = "G" ]; then let "BIG=$BIG*1024000000"; fi
if [ ${LETTER^^} = "T" ]; then let "BIG=$BIG*1024000000000"; fi
if [ ${LETTER^^} = "E" ]; then let "BIG=$BIG*1024000000000000"; fi
git log --pretty="%H %s" --topo-order | while read -r commithash message; do
git diff-tree -r --name-only --diff-filter=AMT $commithash |
tail -n +2 | (_iter=0; while read path; do
#echo $(dirname "$path")/$(basename "$path")
objsize=$(git cat-file -s "$commithash:$path")
[ $objsize -lt $BIG ] && continue
if [ -z $SINGLE_LINE ]; then
[ $_iter -eq 0 ] && echo -e "\n"$blue"$commithash"$cdef"\n"$red"$message"$cdef
echo $dkgray$(dirname "$path")$cdef"/"$white$(basename "$path")" "$yellow"$objsize"$cdef
[ $_iter -eq 0 ] && _color="$blue" || _color="$dkgray"
echo $_color"$commithash"$cdef"/"$green$(dirname "$path")$cdef"/"$white$(basename "$path")" "$yellow"$objsize"$cdef
let "_iter++"
Copy link

cdr9042 commented Jun 26, 2023

while color coding makes it easy to read on terminal, it complicates logging the output to a log file. The string "�[90m" used for color coding does not work on text file, I don't know any text reading program that can parse "�[90m".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment