This should work generally, but I use this to track the number of words changed in a (LaTeX) paper with a version history in git (and which Overleaf uses by default).
This is a tricky thing to deal with for many reasons.
Show the added words, deleted words, words on duplicate lines on every commit in the last day (bash):
for sha in $(git rev-list --since="6am" master | sed -e '$ d'); do
echo $(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^+[^+]"|wc -w|xargs),\
$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^-[^-]"|wc -w|xargs),\
$(git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs)
done
Since sometimes we move massive amounts of text, showing the words inside duplicate lines can show flag words that are just from moving things around. If the number of words picked up by the words on duplicate lines rivals that of the added and removed, it's probably just a move commit.
Assuming that in a "move commit," 80%+ of the lines are duplicates, the following code should
show you the total number of edited words in a day. Edit the --since
command at the top
to get it for different ranges (e.g., --since="10 days ago"
).
total=0
for sha in $(git rev-list --since="6am" master | sed -e '$ d'); do
added=$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^+[^+]"|wc -w|xargs)
deleted=$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^-[^-]"|wc -w|xargs)
duplicated=$(git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs)
if [ "$added" -eq "0" ]; then
changed=$deleted
total=$((total+deleted))
echo "added:" $added, "deleted:" $deleted, "duplicated:"\
$duplicated, "changed:" $changed
elif [ "$(echo "$duplicated/$added > 0.8" | bc -l)" -eq "1" ]; then
echo "added:" $added, "deleted:" $deleted, "duplicated:"\
$duplicated, "changes counted:" 0
else
changed=$((added+deleted))
total=$((total+changed))
echo "added:" $added, "deleted:" $deleted, "duplicated:"\
$duplicated, "changes counted:" $changed
fi
done
echo "Total changed:", $total
If you are using overleaf, it should auto-commit frequently enough that this works.
Outside of overleaf, you should commit before and after you move large amounts of text so that you can track proper word count changes in a file.
@MilesCranmer thank you for putting this together! Pardon my ignorance, but I got a high word count, which I think may mean the search includes things like modifying
.bib
files. Is there a way to ensure it only reports changes within the main.tex
file and not the whole repository?