Last active
June 16, 2020 22:32
-
-
Save jspiro/729e64e451d05ba3ce655a34cfb43d2a to your computer and use it in GitHub Desktop.
horrific but working way to summarize file sizes in a directory for evaluating what's tracked by git lfs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ find . -type f -name '*.*' | | |
sed 's/.*\.//' | | |
sort -u | | |
while read ext; do | |
( | |
gfind . -name "*$ext" -printf "%s\n" | | |
sed 's/$/+/' | | |
tr -d '\n' | |
echo 0 | |
) | bc | tr -d '\n' | |
echo -e "\t$ext" | |
done | |
1003744 .exe | |
911328 .zip | |
806448 .dc | |
623008 .pck | |
564880 .mp4 | |
497016 .d | |
351848 .efz | |
323112 .g | |
... |
You can condense this part of your bash pipeline into a single sed command and avoid all the character translation (tr). Note, some extra complexity was required to avoid embedding a literal tab, which I know you hate. 😉
from
... | tr -d '[ \t]total' | tr '\n' ',' | sed -n -E -e 's!([^,]+),([^,]+),!\2%\1@!gp' | tr '%' '\t' | tr '@' '\n' | ...
to
... | sed -n -E "/^\\..+/ {
N
s/([^[:space:]]*)[[:space:]]*\\n([^[:space:]]*)[[:space:]]*total/\\2$(echo -ne '\t')\\1/
p
}" | ...
Credit for a WORKING solution to @kenthoward
Notes:
- Switch to iname if case should be insensitive for extensions
- But for Git LFS, case matters, so either lowercase all extensions, or make case insensitive LFS matchs (.[eE][xX][eE]), or:
- sort by size, find a cut off where files are to small to bother
- swap the file size with the file name using `s/(.*) (.*)/$2 $1/g`
- sort naturally
- visually find all related line endings and add each case to .gitattributes
One little bug which probably didn't affect your results. When simplifying the extension extraction to just sed 's/.*\.//'
the gfind -name test should have been updated to include a literal period.
gfind . -name "*.$ext" -printf "%s\n"
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
tr '[ \t\n]' ',' | sed -n -E -e 's!([^,]+),([^,]+),([^,]+),!\2%\1@!gp' | tr '%' '\t' | tr '@' '\n' | sort -rn
is my way of working aroundsed
's lack of... everything. The equivalent regex replace in a modern IDE is:s@^(\..*)\n(.*)\s+total$@$2\t$1