You can use the "pickaxe" functions of Git to look for commits where a certain string was added, deleted or moved.
It is supported by git log
, git show
and git diff
, as well as the plumbing commands git diff-files
, git diff-index
and git diff-tree
.
It goes like this:
git log -S 'string' # shows commits where a line containing 'string' was added or deleted
git log -G 'string' # shows commits where a line containing 'string' was added, deleted or moved
You can also use a regex instead of a plain string:
git log -S 'regex' --pickaxe-regex # shows commits where a line matching 'regex' was added or deleted
git log -G 'regex' # shows commits where a line matching 'regex' was added, deleted or moved (-G defaults to a regex)
You can of course use all other git log
options as well, like showing the full patch, the diffstat, etc. of the relevant commits:
git log -S 'string' --stat # also shows the diffstat of the files where a line containing 'string' was added or deleted
git log -S 'string' -p # also shows the full patch of the files where a line containing 'string' was added or deleted
By default, the above commands limit the diff to the files whose hunks match the given string/regex.
To show the full diff of each commit, you can add the --pickaxe-all
option:
git log -S 'string' --stat --pickaxe-all # shows the full diffstat of the commits where a line containing 'string' was added or deleted
git log -S 'string' -p --pickaxe-all # shows the full diff of the commits where a line containing 'string' was added or deleted
Sometimes the full diff (-p
) is too much information, even without --pickaxe-all
.
What if you want to see only the hunks that contain the search string or regex ?
This is a little bit tricky, but it's possible thanks to Git's flexibility.
The trick is to define and call an external diff driver that will generate the diff patches, but keep only the relevant hunks.
First, we add a script called "pickaxe-diff" somewhere in our $PATH
.
This script is where the magic happens, and it makes use of the grepdiff command from the patchutils package.
Here is the gist of my "pickaxe-diff" script:
#!/bin/bash
# pickaxe-diff : external diff driver for Git.
# To be used with the pickaxe options (git [log|show|diff[.*]] [-S|-G])
# to only show hunks containing the searched string/regex.
path=$1
old_file=$2
old_hex=$3
old_mode=$4
new_file=$5
new_hex=$6
new_mode=$7
diff_output=$(git diff --no-color --no-ext-diff -p $old_file $new_file || :)
filtered_diff=$( echo "$diff_output" | \
grepdiff "$GREPDIFF_REGEX" --output-matching=hunk | \
\grep -v -e '^--- a/' -e '^+++ b/' | \
\grep -v -e '^diff --git' -e '^index ')
a_path="a/$path"
b_path="b/$path"
echo "diff --git $a_path $b_path"
echo "index $old_hex..$new_hex $old_mode"
echo "--- $a_path"
echo "+++ $b_path"
echo "$filtered_diff"
Note that Git passes 7 arguments to the external diff driver, which are documented in the main man page for git.
We use git diff --no-ext-diff
to generate the diff (it's very important to add --no-ext-diff
here, since if we don't the script calls itself recursively!), then pipe it to grepdiff
to filter the hunks and keep only those containing $GREPDIFF_REGEX
.
Since we can't control what variables Git passes as arguments to our diff driver, we need to make sure that GREPDIFF_REGEX
is available to our script when it is called by Git.
Then, we need to tell Git to use our external diff driver. This can be done using the GIT_EXTERNAL_DIFF
environment variable.
We also need to define a GREPDIFF_REGEX
variable so that our pickaxe-diff
script can get the search string:
GREPDIFF_REGEX=<string> GIT_EXTERNAL_DIFF=pickaxe-diff bash -c 'git log -p --ext-diff -S $GREPDIFF_REGEX'
Note that we need the --ext-diff
option to convince git log
to use our custom driver, and that we need to make sure our GREPDIFF_REGEX
variable is correctly received by the -S flag (bash -c ''
).
Another way to do it is exporting the variable, optionnally in a subshell:
export GREPDIFF_REGEX=<string>; GIT_EXTERNAL_DIFF=pickaxe-diff git log -p --ext-diff -S $GREPDIFF_REGEX; unset GREPDIFF_REGEX
# or
(export GREPDIFF_REGEX=<string>; GIT_EXTERNAL_DIFF=pickaxe-diff git log -p --ext-diff -S $GREPDIFF_REGEX)
As an aside, note that an external diff driver can also be defined using the Git configuration mechanism, namely the diff.external
configuration option.
An equivalent invocation to the above would then be:
(export GREPDIFF_REGEX=<string>; git -c diff.external=pickaxe-diff log -p --ext-diff -S $GREPDIFF_REGEX)
Here we use the -c
flag to the git
command itself, which activates a Git configuration for the duration of the following command only.
Since it's not that convenient to have to define the GREPDIFF_REGEX
variable in a subshell, and use git -c diff.external=pickaxe-diff
(or GIT_EXTERNAL_DIFF
) every time we want to use the pickaxe options, here are some convenient Git aliases :
# $HOME/.gitconfig
[alias]
# git log -p -S
log-pickaxe-s = "!f() { GREPDIFF_REGEX=\"$1\" git -c diff.external=pickaxe-diff log -p --ext-diff -S \"$@\"; }; f"
# git log -p -G
log-pickaxe-g = "!f() { GREPDIFF_REGEX=\"$1\" git -c diff.external=pickaxe-diff log -p --ext-diff -G \"$@\"; }; f"
# git show -S
show-pickaxe-s = "!f() { GREPDIFF_REGEX=\"$1\" git -c diff.external=pickaxe-diff show -p --ext-diff -S \"$@\"; }; f"
# git show -G
show-pickaxe-g = "!f() { GREPDIFF_REGEX=\"$1\" git -c diff.external=pickaxe-diff show -p --ext-diff -G \"$@\"; }; f"
# git diff -S
diff-pickaxe-s = "!f() { GREPDIFF_REGEX=\"$1\" git -c diff.external=pickaxe-diff diff -p -S \"$@\"; }; f"
# git diff -G
diff-pickaxe-g = "!f() { GREPDIFF_REGEX=\"$1\" git -c diff.external=pickaxe-diff diff -p -G \"$@\"; }; f"
These make use of the fact that any Git alias starting with !
is interpreted by the shell and not by Git itself (see this post for more aliases ideas using this trick!).
Since we are defining and executing shell functions we don't need to use a subshell.
Note also that git diff
does not need the --ext-diff
option to use our external diff driver.
These aliases are defined in git-pickaxe-aliases.gitconfig
in the Gist's repo, so after cloning it you can simply include them in your Git config:
# $HOME/.gitconfig
[include]
path = ~/path/to/git-pickaxe-filter-hunks.md/git-pickaxe-aliases.gitconfig
Now we can simply use our aliases to pickaxe with hunk filtering !
git log-pickaxe-s <string> [<git log arguments>]
git log-pickaxe-g <string> [<git log arguments>]
git show-pickaxe-s <string> [<git show arguments>]
git show-pickaxe-g <string> [<git show arguments>]
git diff-pickaxe-s <string> [<git diff arguments>]
git diff-pickaxe-g <string> [<git diff arguments>]
With the pickaxe-diff
script above, the hunks are not colorized even if color.ui
is set, because the hunks are piped from git diff --no-ext-diff
to grepdiff
.
Even if we try to add --color=always
, grepdiff
does not seem to work if it is given colorized input.
But the pickaxe-diff
script can easily be modified to colorize its output according to the configured Git colors:
#!/bin/bash
# pickaxe-diff : external diff driver for Git.
# To be used with the pickaxe options (git [log|show|diff[.*] [-S|-G])
# to only show hunks containing the searched string/regex.
echo_meta () {
echo "${color_meta}$1${color_none}"
}
path=$1
old_file=$2
old_hex=$3
old_mode=$4
new_file=$5
new_hex=$6
new_mode=$7
color_frag=$(git config --get-color color.diff.frag cyan)
color_func=$(git config --get-color color.diff.func '')
color_meta=$(git config --get-color color.diff.meta 'normal bold')
color_new=$(git config --get-color color.diff.new green)
color_old=$(git config --get-color color.diff.old red)
color_none=$(tput sgr 0)
diff_output=$(git diff --no-color --no-ext-diff -p $old_file $new_file || :)
filtered_diff=$( echo "$diff_output" | \
grepdiff "$GREPDIFF_REGEX" --output-matching=hunk | \
\grep -v -e '^--- a/' -e '^+++ b/' | \
\grep -v -e '^diff --git' -e '^index '
sed -e "s/\(@@ .* @@\)\(.*\)/${color_frag}\1${color_func}\2${color_none}/" | \
sed -e "s/^\(+.*\)/${color_new}\1${color_none}/" | \
sed -e "s/^\(-.*\)/${color_old}\1${color_none}/" )
a_path="a/$path"
b_path="b/$path"
echo_meta "diff --git $a_path $b_path"
echo_meta "index $old_hex..$new_hex $old_mode"
echo_meta "--- $a_path"
echo_meta "+++ $b_path"
echo "$filtered_diff"
The way that the Git pickaxe work is that it limits the output to the files whose hunks change the given string/regex.
This means that if another hunk in these files also contain the search string/regex, but does not change it (ex. it appears in context lines), it will still be displayed.
This is a limitation of grepdiff
before 0.4.0.
A pull request at the patchutils project added an --only-match
flag to grepdiff
, which provides the needed functionality to correctly filter out these hunks.
We can thus verify if this flag exists in the installed version of grepdiff
, and add it to our invocation in that case:
# ...
only_match_flag=""
if grepdiff -h 2>&1 | \grep -q -e '--only-match'
only_match_flag="--only-match=mod"
fi
diff_output=$(git diff --no-color --no-ext-diff -p $old_file $new_file || :)
filtered_diff=$( echo "$diff_output" | \
grepdiff "$GREPDIFF_REGEX" --output-matching=hunk ${only_match_flag} | \
# ...
- When using the colorized version, any redirection (piping or writing the output to a file) will retain the color codes.
See the TODO for more future work.
References:
https://stackoverflow.com/questions/34885397/using-custom-diff-tool-with-git-show/34934452
https://unix.stackexchange.com/questions/216066/display-only-relevant-hunks-of-a-diff-patch-based-on-a-regexp
https://stackoverflow.com/questions/13192594/add-patch-in-git-all-hunks-matching-regex-in-file
https://stackoverflow.com/questions/10856129/setting-an-environment-variable-before-a-command-in-bash-not-working-for-second
https://git-scm.com/docs/git-log
https://git-scm.com/docs/git