-
Star
(286)
You must be signed in to star a gist -
Fork
(34)
You must be signed in to fork a gist
-
-
Save amitchhajer/4461043 to your computer and use it in GitHub Desktop.
git ls-files -z | xargs -0n1 git blame -w | perl -n -e '/^.*\((.*?)\s*[\d]{4}/; print $1,"\n"' | sort -f | uniq -c | sort -n |
You can replace sed with grep and avoid xargs entirely (Bourne-ish shell; tested in dash and bash):
git ls-files | while read f; do git blame --line-porcelain $f | grep '^author '; done | sort -f | uniq -ic | sort -n
For the aim of authorship statistics analysis it also makes sense to ignore white spaces and lines moved between files in blame command:
git ls-files | while read f; do git blame -w -M -C -C --line-porcelain $f | grep '^author '; done | sort -f | uniq -ic | sort -n
*note that twice -C -C is important here
git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep '^author '; done | sort -f | uniq -ic | sort -n
Make sure to quote $f
, otherwise this'll break on paths with spaces in them.
To get rid of the occassional "Binary file (standard input) matches" line, add the -I
option to grep
:
git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n
To list the stats on a single file (and I like the order in reverse):
git blame -w -M -C -C --line-porcelain FILENAMEGOESHERE | grep -I '^author ' | sort -f | uniq -ic | sort -nr
From time to time, a user's name may change (casing, first-last vs last-first etc) and in some limited cases, sorting based not on the name but on the email may be more indiciative:
git ls-files | \
while read f; do \
git blame -w -M -C -C --line-porcelain "$f" | \
grep -I '^author-mail '; \
done | cut -f2 -d'<' | cut -f1 -d'>' | sort -f | uniq -ic | sort -n
The change is to look not for the "^author
" but the "^author-email
" with more cut
s to make the output easier to use later.
None of these solutions really worked for me so I made my own. Tested on MacOS and working. You can try it here.
I've used this a few times now and works perfectly although I'm wondering how to go about refactoring it so that you can query changes between a given date range? Any help would be much appreciated!
git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n
Is the version I'm using. Nice to see everyone improving on the previous commands in each comment!
Another iteration:
git ls-files | while read f; do git blame -w --line-porcelain -- "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n
- Counts current state of the repository rather than all commits in the past. (no need for
-M
or-C -C
) - Avoids the
Binary file (standard input) matches
message - Works on OSX, required a
--
to separate the filename
Another variation: First, export author=somebody
, then:
git ls-files | while read f; do git blame -w --line-porcelain -- "$f" | grep -I '^author ' | sed s_^_"$f"" "_; done | grep "$author" | awk '{print $1}' | sort -f | uniq -ic | sort -n
For a given $username, report number of lines per file. Useful for exploring in more depth unusual cases.
Line numbers are a nice supplement to counting number of commits, but ALL must be taken with a grain of salt.
The first one won't work if there are underscores in the name. This one will:
cat tmp | while read f; do replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$f"); git blame -w --line-porcelain -- "$f" | grep -I '^author ' | sed s/^/"$replaceEscaped"" "/; done | grep "$author" | awk '{print $1}' | sort -f | uniq -ic | sort -n
Source for the sed magic: https://stackoverflow.com/a/29613573/1048186
However, this one is reaching the limit of what could even remotely be considered one line!
I made a cli to make this process easier, it shows a file tree with all the corresponding code owners
Here's a variation on the earlier responses that parallelizes the blame. This can result in a significant speedup if you have multiple cores. This version also supports filenames that may be quoted by 'git ls-files' (tabs, newlines, backslashes, quotes, UTF-8, etc.) or that begin with a "-":
git ls-files -z |
xargs -0rn 1 -P "$(nproc)" -I{} sh -c 'git blame -w -M -C -C --line-porcelain -- {} | grep -I --line-buffered "^author "' |
sort -f |
uniq -ic |
sort -n
Hello,
Is there any way to count month wise data like from Jan - Mar how many number of code lines in git repository per user?
The other commands here took hours for our project. Here is a faster method:
- Remember to use
blame.ignoreRevsFile
to ignore mass-edits (like code style fixes). - Use `git ls-files -x "*pdf" -x "*xml"`` to filter out files.
git ls-files | while read i; do git blame $i | sed -e 's/^[^(]*(//' -e 's/^\([^[:digit:]]*\)[[:space:]]\+[[:digit:]].*/\1/' -e 's/[[:blank:]]*$//'; done | sort -f | uniq -ic | sort -rn
Counting only activity last two years:
git ls-files | while read i; do git blame $i --since 2.years | grep -v '^\^' | sed -e 's/^[^(]*(//' -e 's/^\([^[:digit:]]*\)[[:space:]]\+[[:digit:]].*/\1/' -e 's/[[:blank:]]*$//'; done | sort -f | uniq -ic | sort -rn
Solution modified from: https://stackoverflow.com/a/2788077
here's my one-liner:
function gitfilecontributors() { local perfile="false" ; if [[ $1 = "-f" ]]; then perfile="true" ; shift ; fi ; if [[ $# -eq 0 ]]; then echo "no files given!" >&2 ; return 1 ; else local f ; { for f in "$@"; do echo "$f" ; git blame --show-email "$f" | sed -nE 's/^[^ ]* *.<([^>]*)>.*$/: \1/p' | sort | uniq -c | sort -r -nk1 ; done } | if [[ "$perfile" = "true" ]]; then tee /tmp/gitblamestats.txt ; else tee /tmp/gitblamestats.txt >/dev/null ; fi ; echo ; echo "total:" ; awk -v FS=' *: *' '/^ *[0-9]/{sums[$2] += $1} END { for (i in sums) printf("%7s : %s\n", sums[i], i)}' /tmp/gitblamestats.txt | sort -r -nk1 ; fi ; }
or with line breaks:
gitfilecontributors ()
{
local perfile="false";
if [[ $1 = "-f" ]]; then
perfile="true";
shift;
fi;
if [[ $# -eq 0 ]]; then
echo "no files given!" 1>&2;
return 1;
else
local f;
{
for f in "$@";
do
echo "$f";
git blame --show-email "$f" | sed -nE 's/^[^ ]* *.<([^>]*)>.*$/: \1/p' | sort | uniq -c | sort -r -nk1;
done
} | if [[ "$perfile" = "true" ]]; then
tee /tmp/gitblamestats.txt;
else
tee /tmp/gitblamestats.txt > /dev/null;
fi;
echo;
echo "total:";
awk -v FS=' *: *' '/^ *[0-9]/{sums[$2] += $1} END { for (i in sums) printf("%7s : %s\n", sums[i], i)}' /tmp/gitblamestats.txt | sort -r -nk1;
fi
}
usage possible four folder(s) of your choice.
option -f to show per file, otherwise totals only:
$ gitfilecontributors $(fd --type f '.*' source)
total:
139 : [email protected]
29 : [email protected]
9 : [email protected]
gitfilecontributors -f $(fd --type f '.*' source)
source/040_InitialSetup.md
80 : [email protected]
29 : [email protected]
6 : [email protected]
README.md
59 : [email protected]
5 : [email protected]
3 : [email protected]
total:
139 : [email protected]
29 : [email protected]
9 : [email protected]
5 : [email protected]
this is exactly what i was looking for :)
this one does not need perl
git it here http://www.commandlinefu.com/commands/view/3889/prints-per-line-contribution-per-author-for-a-git-repository