git log --reverse --pretty=format:'%H' | while read commit_hash;
do
# Extract required commit information
commit_author=$(git show -s --format='%an' $commit_hash 2>/dev/null)
if [ -z "$commit_author" ]; then
echo "Skipping invalid commit hash: $commit_hash"
continue
fi
commit_author_email=$(git show -s --format='%ae' $commit_hash)
commit_date=$(git show -s --format='%cI' $commit_hash) # ISO 8601 format
commit_title=$(git show -s --format='%s' $commit_hash | sed 's/"/\\"/g') # First line of commit message as title
# Get diffs and encode in base64
diffs=$(git diff $commit_hash^ $commit_hash 2>/dev/null | base64 | tr -d '\n') # Remove newlines after encoding
if [ -z "$diffs" ]; then
echo "No diffs found for commit: $commit_hash"
continue
fi
# Construct JSON object
json_object="{\"hash\": \"$commit_hash\", \"title\": \"$commit_title\", \"date\": \"$commit_date\", \"author\": \"$commit_author\", \"mail\": \"$commit_author_email\", \"diffs\": \"$diffs\"}"
# Append the JSON object to a file
echo $json_object >> changes.jsonl
done
This script version encodes the diffs in base64, allowing you to include the entire diff without needing to escape newlines. This method keeps the JSON Lines format valid since the encoded diff is a single-line string.
To Decode the Diff:
To view or process the diffs after extracting them from the JSON, you'll need to decode them from base64. You can do this with command-line tools or programmatically in most programming languages. For example, using base64
command-line tool:
echo "encoded_diff" | base64 --decode
Replace "encoded_diff"
with the actual base64-encoded diff string you've extracted from your JSONL file.
Note: This method increases the size of the diff data in the JSONL file due to base64 encoding overhead. However, it ensures that newlines and other special characters in diffs are preserved without breaking the JSON Lines format.