Skip to content

Instantly share code, notes, and snippets.

@april
Last active November 4, 2024 22:45
Show Gist options
  • Save april/ee2e104b1435f3113e67663d8875bbef to your computer and use it in GitHub Desktop.
Save april/ee2e104b1435f3113e67663d8875bbef to your computer and use it in GitHub Desktop.
pure shell function for git log as JSON
# attempting to be the most robust solution for outputting git log as JSON,
# using only `git` and the standard shell functions, without requiring
# additional software.
# - uses traditional JSON camelCase
# - includes every major field that git log can output, including the body
# - proper sections for author, committer, and signature
# - multiple date formats (one for reading, ISO for parsing)
# - should properly handle (most? all?) body values, even those that contain
# quotation marks and escaped characters
# - outputs as minimized JSON, can be piped to `jq` for pretty printing
# - can run against the current directory as `git-log-json` or against a file
# or folder with `git-log-json foo`
# - easily piped into `jq`, e.g. this will get all the commit subjects:
# $ git-log-json foo | jq -r '.[] | .subject'
# credit to @nsisodiya, @varemenos, @overengineer, and others for the
# original working code:
# https://gist.github.com/varemenos/e95c2e098e657c7688fd
git-log-json() {
IFS='' read -r -d '' FORMAT << 'EOF'
{
^^^^author^^^^: { ^^^^name^^^^: ^^^^%aN^^^^,
^^^^email^^^^: ^^^^%aE^^^^,
^^^^date^^^^: ^^^^%aD^^^^,
^^^^dateISO8601^^^^: ^^^^%aI^^^^},
^^^^body^^^^: ^^^^%b^^^^,
^^^^commitHash^^^^: ^^^^%H^^^^,
^^^^commitHashAbbreviated^^^^: ^^^^%h^^^^,
^^^^committer^^^^: {
^^^^name^^^^: ^^^^%cN^^^^,
^^^^email^^^^: ^^^^%cE^^^^,
^^^^date^^^^: ^^^^%cD^^^^,
^^^^dateISO8601^^^^: ^^^^%cI^^^^},
^^^^encoding^^^^: ^^^^%e^^^^,
^^^^notes^^^^: ^^^^%N^^^^,
^^^^parent^^^^: ^^^^%P^^^^,
^^^^parentAbbreviated^^^^: ^^^^%p^^^^,
^^^^refs^^^^: ^^^^%D^^^^,
^^^^signature^^^^: {
^^^^key^^^^: ^^^^%GK^^^^,
^^^^signer^^^^: ^^^^%GS^^^^,
^^^^verificationFlag^^^^: ^^^^%G?^^^^},
^^^^subject^^^^: ^^^^%s^^^^,
^^^^subjectSanitized^^^^: ^^^^%f^^^^,
^^^^tree^^^^: ^^^^%T^^^^,
^^^^treeAbbreviated^^^^: ^^^^%t^^^^
},
EOF
FORMAT=$(echo $FORMAT|tr -d '\r\n ')
git log --pretty=format:$FORMAT $1 | \
sed -e ':a' -e 'N' -e '$!ba' -e s'/\^^^^},\n{\^^^^/^^^^},{^^^^/g' \
-e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/\^^^^/"/g' -e '$ s/,$//' | \
sed -e ':a' -e 'N' -e '$!ba' -e 's/\r//g' -e 's/\n/\\n/g' -e 's/\t/\\t/g' | \
awk 'BEGIN { ORS=""; printf("[") } { print($0) } END { printf("]\n") }'
}
@isaacs
Copy link

isaacs commented Nov 28, 2022

We had a brief conversation about this on Mastodon where I made a joke about what happens if you get a PR from Alex Ca^^^^rets and the difficulty of using null characters in bash and sed, and my annoying brain couldn't let go of it, so check this out, only uses bash builtins and posix sed.

git-log-json() {
  local Q=$'\x01'-$RANDOM-$(date +%s)
  local FORMAT=--pretty=format:"{${Q}author${Q}:{${Q}name${Q}:${Q}%aN${Q},${Q}email${Q}:${Q}%aE${Q},${Q}date${Q}:${Q}%aD${Q},${Q}dateISO8601${Q}:${Q}%aI${Q}},${Q}body${Q}:${Q}%b${Q},${Q}commitHash${Q}:${Q}%H${Q},${Q}commitHashAbbreviated${Q}:${Q}%h${Q},${Q}committer${Q}:{${Q}name${Q}:${Q}%cN${Q},${Q}email${Q}:${Q}%cE${Q},${Q}date${Q}:${Q}%cD${Q},${Q}dateISO8601${Q}:${Q}%cI${Q}},${Q}encoding${Q}:${Q}%e${Q},${Q}notes${Q}:${Q}%N${Q},${Q}parent${Q}:${Q}%P${Q},${Q}parentAbbreviated${Q}:${Q}%p${Q},${Q}refs${Q}:${Q}%D${Q},${Q}signature${Q}:{${Q}key${Q}:${Q}%GK${Q},${Q}signer${Q}:${Q}%GS${Q},${Q}verificationFlag${Q}:${Q}%G?${Q}},${Q}subject${Q}:${Q}%s${Q},${Q}subjectSanitized${Q}:${Q}%f${Q},${Q}tree${Q}:${Q}%T${Q},${Q}treeAbbreviated${Q}:${Q}%t${Q}},"

  git log "$FORMAT" $1 | \
    (
      echo -n '['
      sed -e ':a' -e 'N' -e '$!ba' \
        -e 's/,$//' \
        -e 's/'$Q'},\n{'$Q'/'$Q'},{'$Q'/g' \
        -e 's/\\/\\\\/g' \
        -e 's/\r//g' \
        -e 's/\n/\\n/g' \
        -e 's/\t/\\t/g' \
        -e 's/"/\\"/g' \
        -e 's/'$Q'/"/g'
      echo ']'
    )
}

So unless you use \x01 "Start of Header" bytes in your commit messages (which, weird) and the same random number and timestamp at the time of running the command, it'll work. Which means, if it fails, just try it again, and it'll probably work. (A uuid for $Q would probably be cleaner, but violates the constraints of only requiring standard bash and sed.)

@april
Copy link
Author

april commented Nov 28, 2022

I wonder if simply doing something like:

local Q=$RANDOM$RANDOM$RANDOM$RANDOM$RANDOM$RANDOM$RANDOM

Would be enough? That's quite a lot of entropy.

@gubasso
Copy link

gubasso commented Dec 20, 2023

@april I'm running into an issue with repos that have only one commit. I've tested this in a few such repositories, using simple commit messages, and got the same error. It seems like the function isn't parsing correctly when there is a single commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment