Skip to content

Instantly share code, notes, and snippets.

@varemenos
Last active February 11, 2025 12:00
Show Gist options
  • Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Git log in JSON format

Get Git log in JSON format

git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},'

The only information that aren't fetched are:

  • %B: raw body (unwrapped subject and body)
  • %GG: raw verification message from GPG for a signed commit

The format is applied to each line, so once you get all the lines, you need to remove the trailing , and wrap them around an Array.

git log pretty format source: http://git-scm.com/docs/pretty-formats

Here is an example in Javascript based on a package I'm working on for Atom:

var format = '{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},';

var commits = [];

new BufferedProcess({
    command: 'git',
    args: [
        'log',
        '--pretty=format:' + format
    ],
    stdout: function (chunk) { commits += chunk },
    exit: function (code) {
        if (code === 0) {
            var result = JSON.parse('[' + commits.slice(0, -1) + ']');

            console.log(result); // valid JSON array
        }
    }
});
@floriankraemer
Copy link

floriankraemer commented Apr 20, 2024

Here is another idea, written in PHP. Put it in a php file in the root of a repository and execute it. It will iterate over each single placeholder for all commits separately.

Yes, I know that this is inefficient, but this allows it to get each single value separate without parsing a giant string in which we need to consider all possible cases of something breaking our parser. So the trade off here is to either get good values slower or try to get a probably never perfect parsing solution just to get all of the values in one loop. A language that supports parallel execution could do it probably faster. For a repository with 13205 commits it runs just a few seconds on my machine, generating ~15mb of JSON. I run this on a NVME SSD.

❗ This is just a quick draft, feel free to provide critic or improve it. 😃

<?php

// https://git-scm.com/docs/pretty-formats/2.21.0

$placeholders = [
    'H' => 'hash',
    'h' => 'abbreviated_hash',
    'P' => 'parent_hash',
    'p' => 'abbreviated_parent_hash',
    'T' => 'tree_hash',
    't' => 'abbreviated_tree_hash',
    'an' => 'author_name',
    'ae' => 'author_email',
    'aD' => 'author_date',
    'at' => 'author_unix_timestamp',
    'cn' => 'committer_name',
    'ce' => 'committer_email',
    'cD' => 'committer_date',
    'ct' => 'committer_unix_timestamp',
    's' => 'subject',
    'b' => 'body',
    'B' => 'raw_body',
    'N' => 'notes',
    'D' => 'branch',
    'd' => 'commit_tag',
    'gD' => 'reflog_selector',
    'gs' => 'reflog_subject',
    'gn' => 'reflog_name',
    'e' => 'encoding',
    'f' => 'sanitized_subject_line',
];

$commits = [];

foreach ($placeholders as $placeholder => $name) {
    $gitCommand = 'git log --format=\'%H2>>>>> %' . $placeholder . '\'';

    $output = shell_exec($gitCommand);
    $lines = explode("\n", $output);
    foreach ($lines as $line) {
        if (preg_match('/^.*>>>>> .*$/', $line)) {
            $commitId = substr($line,0, 41);
        } else {
            $commits[$commitId][$name] .= $line;
            continue;
        }

        if (!isset($commits[$commitId])) {
            $commits[$commitId] = [];
        }

        $commits[$commitId][$name] = substr($line, 47);
    }
}

file_put_contents('commits.json', json_encode($commits, JSON_THROW_ON_ERROR | JSON_PRETTY_PRINT | JSON_INVALID_UTF8_SUBSTITUTE));

@tugrulates
Copy link

tugrulates commented Feb 2, 2025

The commit hash itself cannot be found inside the commit (since it is calculated from the commit file). So potentially, it could be used as a delimiter between commit fields. There should not be a risk of collision, unless I am missing something.

In that case, something like git log --format="%H#%s%H%b%H" would be all textual and parsable.

Example implementation:

function parseLog(log: string) {
  const commits: { hash: string; summary: string; body: string }[] = [];

  while (log.length) {
    const delimiter = log.indexOf("#");
    if (delimiter === -1) throw new Error(`Cannot parse commit log`);
    const hash = log.slice(0, delimiter);
    const [summary, body, rest] = log.slice(hash.length + 1).split(hash, 7);
    if (summary === undefined || body === undefined) {
      throw new Error(`Cannot parse commit log`);
    }
    log = rest?.trimStart() ?? "";
    commits.push({ hash, summary, body });
  }

  return commits;
}

Ref names (branches, tags and alike) are not part of the commit, so their names could collide, but they are easier to delimiter for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment