This should work generally, but I use this to track the number of words changed in a (LaTeX) paper with a version history in git (and which Overleaf uses by default).
This is a tricky thing to deal with for many reasons.
Show the added words, deleted words, words on duplicate lines on every commit in the last day (bash):
for sha in $(git rev-list --since="6am" master | sed -e '$ d'); do
echo $(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^+[^+]"|wc -w|xargs),\
$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^-[^-]"|wc -w|xargs),\
$(git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs)
done
Since sometimes we move massive amounts of text, showing the words inside duplicate lines can show flag words that are just from moving things around. If the number of words picked up by the words on duplicate lines rivals that of the added and removed, it's probably just a move commit.
Assuming that in a "move commit," 80%+ of the lines are duplicates, the following code should
show you the total number of edited words in a day. Edit the --since
command at the top
to get it for different ranges (e.g., --since="10 days ago"
).
total=0
for sha in $(git rev-list --since="6am" master | sed -e '$ d'); do
added=$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^+[^+]"|wc -w|xargs)
deleted=$(git diff --word-diff=porcelain $sha~1..$sha|grep -e"^-[^-]"|wc -w|xargs)
duplicated=$(git diff $sha~1..$sha|grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs)
if [ "$added" -eq "0" ]; then
changed=$deleted
total=$((total+deleted))
echo "added:" $added, "deleted:" $deleted, "duplicated:"\
$duplicated, "changed:" $changed
elif [ "$(echo "$duplicated/$added > 0.8" | bc -l)" -eq "1" ]; then
echo "added:" $added, "deleted:" $deleted, "duplicated:"\
$duplicated, "changes counted:" 0
else
changed=$((added+deleted))
total=$((total+changed))
echo "added:" $added, "deleted:" $deleted, "duplicated:"\
$duplicated, "changes counted:" $changed
fi
done
echo "Total changed:", $total
If you are using overleaf, it should auto-commit frequently enough that this works.
Outside of overleaf, you should commit before and after you move large amounts of text so that you can track proper word count changes in a file.
Very creative. I'm working on a book/paper compiler using GitHub actions and was looking for a word count alternative based on commits. I think this may be the best example of how it's done. It doesn't fit my use case since I'm only counting identified directories within a repo. But, it did a fabulous job for the use case it targets.
Best of luck on the PhD.