Last active February 11, 2025 12:00
Git log in JSON format

git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},'

The only information that aren't fetched are:

  • %B: raw body (unwrapped subject and body)
  • %GG: raw verification message from GPG for a signed commit

The format is applied to each line, so once you get all the lines, you need to remove the trailing , and wrap them around an Array.

git log pretty format source:

Here is an example in Javascript based on a package I'm working on for Atom:

var format = '{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},';

var commits = [];

new BufferedProcess({
    command: 'git',
    args: [
        '--pretty=format:' + format
    stdout: function (chunk) { commits += chunk },
    exit: function (code) {
        if (code === 0) {
            var result = JSON.parse('[' + commits.slice(0, -1) + ']');

            console.log(result); // valid JSON array
I tried all sort of command, but most of them are failing. finally merged and tried many combination and finally this is working.

git log --pretty=format:'{%n  ^^^^commit^^^^: ^^^^%H^^^^,%n  ^^^^abbreviated_commit^^^^: ^^^^%h^^^^,%n  ^^^^tree^^^^: ^^^^%T^^^^,%n  ^^^^abbreviated_tree^^^^: ^^^^%t^^^^,%n  ^^^^parent^^^^: ^^^^%P^^^^,%n  ^^^^abbreviated_parent^^^^: ^^^^%p^^^^,%n  ^^^^refs^^^^: ^^^^%D^^^^,%n  ^^^^encoding^^^^: ^^^^%e^^^^,%n  ^^^^subject^^^^: ^^^^%s^^^^,%n  ^^^^sanitized_subject_line^^^^: ^^^^%f^^^^,%n  ^^^^commit_notes^^^^: ^^^^%N^^^^,%n  ^^^^verification_flag^^^^: ^^^^%G?^^^^,%n  ^^^^signer^^^^: ^^^^%GS^^^^,%n  ^^^^signer_key^^^^: ^^^^%GK^^^^,%n  ^^^^author^^^^: {%n    ^^^^name^^^^: ^^^^%aN^^^^,%n    ^^^^email^^^^: ^^^^%aE^^^^,%n    ^^^^date^^^^: ^^^^%aD^^^^%n  },%n  ^^^^commiter^^^^: {%n    ^^^^name^^^^: ^^^^%cN^^^^,%n    ^^^^email^^^^: ^^^^%cE^^^^,%n    ^^^^date^^^^: ^^^^%cD^^^^%n  }%n},' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | sed "$ s/,$//" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'  | awk 'BEGIN { print("[") } { print($0) } END { print("]") }' > git-log.json

Hope it will save time of many developers.

terezka commented Jan 8, 2021

This blob was just what I was looking for- thank you!!

Thanks you!!!
small question :
is there a way to get the plus/minus numbers as well (changed , inserted, deleted) for each of the entries ?

panta82 commented Aug 22, 2021

@nsisodiya commiter -> committer

overengineer commented Sep 30, 2021

What about this:
Tested with jq on repository with 1605 commits. No crashes 👍

Sanitizes commit messages and author names.
(escape newline, escape quotes, escape backslash, escape tabs, fix double escapes, delete invalid characters)
I encountered null characters, control characters, quotes in git log. Finally sanitized everything (I hope).

Edit: I tested it on Blender repo. Well, it has some failing edge cases.

@nsisodiya's recommendation above is the only thing that truly worked for me. Thanks so much, saving this to a bash function for the rest of my life. :)

@april-dbx Most Welcome.

april commented Nov 23, 2022

Don't mean to add another to the pile, but here we go:

This builds upon what @nsisodiya, @varemenos, and @overengineer put together, but with:

  • made into a convenient shell function that can be run against current directory or file/folder
  • uses traditional JSON camelCase
  • includes every major field that git log can output, including the body
  • proper sections for author, committer, and signature
  • multiple date formats (one for reading, ISO8601 for parsing)
  • should properly handle (most? all?) body values, even those that contain newlines, tabs, quotation marks and escaped characters
  • outputs as minimized JSON, can be piped to jq for pretty printing: git-log-json | jq -r '.[] | .subject'

I tested it against a git repository with over a million commits without issues, but there certainly might be some.

@april - Wow, that is awesome.

H-zk commented Mar 28, 2023

finally, I got the JSON like this without BufferedProcess,but I use #'# as special key to replace and transform the double quote "

const path = require("path")
const fs = require("fs")
const { execSync } = require('child_process');

const startCommitHash = 'f9f457bca14228bad1e00c55d903a3c9f3738fc8'
const endCommitHash = 'HEAD'

// get log rawText
const logTxts = execSync(
  `git log --pretty=format:"{%n  #'#commit#'#: #'#%H#'#,%n  #'#abbreviated_commit#'#: #'#%h#'#,%n  #'#tree#'#: #'#%T#'#,%n  #'#abbreviated_tree#'#: #'#%t#'#,%n  #'#parent#'#: #'#%P#'#,%n  #'#abbreviated_parent#'#: #'#%p#'#,%n  #'#refs#'#: #'#%D#'#,%n  #'#encoding#'#: #'#%e#'#,%n  #'#subject#'#: #'#%s#'#,%n  #'#sanitized_subject_line#'#: #'#%f#'#,%n  #'#body#'#: #'#%b#'#,%n  #'#commit_notes#'#: #'#%N#'#,%n  #'#verification_flag#'#: #'#%G?#'#,%n  #'#signer#'#: #'#%GS#'#,%n  #'#signer_key#'#: #'#%GK#'#,%n  #'#author#'#: {%n    #'#name#'#: #'#%aN#'#,%n    #'#email#'#: #'#%aE#'#,%n    #'#date#'#: #'#%aD#'#%n  },%n  #'#commiter#'#: {%n    #'#name#'#: #'#%cN#'#,%n    #'#email#'#: #'#%cE#'#,%n    #'#date#'#: #'#%cD#'#%n  }%n}," ${startCommitHash}..${endCommitHash}`,
  { encoding: 'utf8' },

// transform json
const logJson = `[${logTxts.slice(0, -1).replace(/"/g, "'").replace(/#'#/g, '"')}]`;

fs.writeFileSync(path.join(__dirname, 'log.json'), logJson);

// parse json and do what you need
const subjectLines = JSON.parse(logJson).map(item => item["sanitized_subject_line"])'subjectLines', subjectLines)

gribok commented Oct 25, 2023

Following commit message breakes each logic in this thread:

Example Commit Message:
Added logic, but it doesn't look nice yet \_o_/

@nsisodiya Any hints?

Validated by jq:

$ bash utilities/  | jq
parse error: Invalid escape at line 2, column 22785

floriankraemer commented Apr 20, 2024

Here is another idea, written in PHP. Put it in a php file in the root of a repository and execute it. It will iterate over each single placeholder for all commits separately.

Yes, I know that this is inefficient, but this allows it to get each single value separate without parsing a giant string in which we need to consider all possible cases of something breaking our parser. So the trade off here is to either get good values slower or try to get a probably never perfect parsing solution just to get all of the values in one loop. A language that supports parallel execution could do it probably faster. For a repository with 13205 commits it runs just a few seconds on my machine, generating ~15mb of JSON. I run this on a NVME SSD.

❗ This is just a quick draft, feel free to provide critic or improve it. 😃



$placeholders = [
    'H' => 'hash',
    'h' => 'abbreviated_hash',
    'P' => 'parent_hash',
    'p' => 'abbreviated_parent_hash',
    'T' => 'tree_hash',
    't' => 'abbreviated_tree_hash',
    'an' => 'author_name',
    'ae' => 'author_email',
    'aD' => 'author_date',
    'at' => 'author_unix_timestamp',
    'cn' => 'committer_name',
    'ce' => 'committer_email',
    'cD' => 'committer_date',
    'ct' => 'committer_unix_timestamp',
    's' => 'subject',
    'b' => 'body',
    'B' => 'raw_body',
    'N' => 'notes',
    'D' => 'branch',
    'd' => 'commit_tag',
    'gD' => 'reflog_selector',
    'gs' => 'reflog_subject',
    'gn' => 'reflog_name',
    'e' => 'encoding',
    'f' => 'sanitized_subject_line',

$commits = [];

foreach ($placeholders as $placeholder => $name) {
    $gitCommand = 'git log --format=\'%H2>>>>> %' . $placeholder . '\'';

    $output = shell_exec($gitCommand);
    $lines = explode("\n", $output);
    foreach ($lines as $line) {
        if (preg_match('/^.*>>>>> .*$/', $line)) {
            $commitId = substr($line,0, 41);
        } else {
            $commits[$commitId][$name] .= $line;

        if (!isset($commits[$commitId])) {
            $commits[$commitId] = [];

        $commits[$commitId][$name] = substr($line, 47);

file_put_contents('commits.json', json_encode($commits, JSON_THROW_ON_ERROR | JSON_PRETTY_PRINT | JSON_INVALID_UTF8_SUBSTITUTE));

tugrulates commented Feb 2, 2025

The commit hash itself cannot be found inside the commit (since it is calculated from the commit file). So potentially, it could be used as a delimiter between commit fields. There should not be a risk of collision, unless I am missing something.

In that case, something like git log --format="%H#%s%H%b%H" would be all textual and parsable.

Example implementation:

function parseLog(log: string) {
  const commits: { hash: string; summary: string; body: string }[] = [];

  while (log.length) {
    const delimiter = log.indexOf("#");
    if (delimiter === -1) throw new Error(`Cannot parse commit log`);
    const hash = log.slice(0, delimiter);
    const [summary, body, rest] = log.slice(hash.length + 1).split(hash, 7);
    if (summary === undefined || body === undefined) {
      throw new Error(`Cannot parse commit log`);
    log = rest?.trimStart() ?? "";
    commits.push({ hash, summary, body });

  return commits;

Ref names (branches, tags and alike) are not part of the commit, so their names could collide, but they are easier to delimiter for.

