Skip to content

Instantly share code, notes, and snippets.

@lcatlett
Created June 17, 2025 14:21
Show Gist options
  • Save lcatlett/39e0923f45f0f880225b982dad6ffc17 to your computer and use it in GitHub Desktop.
Save lcatlett/39e0923f45f0f880225b982dad6ffc17 to your computer and use it in GitHub Desktop.
Git Analysis Scripts - Tools for analyzing repository sizes, branch differences, and identifying large files
#!/bin/bash
# Script to audit and compare branch sizes
echo "=== Branch Size Audit ==="
# Function to get branch size
get_branch_size() {
local branch=$1
echo "Analyzing branch: $branch"
# Get total size of all files in the branch
git checkout $branch 2>/dev/null
if [ $? -eq 0 ]; then
echo " Total files size:"
du -sh . 2>/dev/null | head -1
echo " Git object size:"
git count-objects -vH | grep "size-pack\|size"
echo " Largest files in branch:"
find . -type f -not -path './.git/*' -exec du -h {} + 2>/dev/null | sort -hr | head -20
echo " File count by type:"
find . -type f -not -path './.git/*' | sed 's/.*\.//' | sort | uniq -c | sort -nr | head -10
echo ""
else
echo " Error: Could not checkout branch $branch"
fi
}
# Check if we're in a git repository
if ! git rev-parse --git-dir > /dev/null 2>&1; then
echo "Error: Not in a git repository"
exit 1
fi
# Store current branch
current_branch=$(git branch --show-current)
echo "Current branch: $current_branch"
echo ""
# Analyze both branches
get_branch_size "pr-1"
get_branch_size "preprod"
# Return to original branch
git checkout $current_branch 2>/dev/null
echo "=== Comparison Complete ==="
#!/bin/bash
# Script to find large files in git branches
echo "=== Large Files Audit ==="
# Function to find large files in a branch
find_large_files() {
local branch=$1
echo "=== Large files in branch: $branch ==="
# List all files in the branch with their sizes
git ls-tree -r -t -l $branch | sort -k 4 -nr | head -20 | while read mode type hash size path; do
if [ "$size" != "-" ]; then
# Convert size to human readable
if [ $size -gt 1048576 ]; then
size_mb=$(echo "scale=2; $size/1048576" | bc -l 2>/dev/null || echo "$size bytes")
echo " ${size_mb}MB - $path"
elif [ $size -gt 1024 ]; then
size_kb=$(echo "scale=2; $size/1024" | bc -l 2>/dev/null || echo "$size bytes")
echo " ${size_kb}KB - $path"
else
echo " ${size}B - $path"
fi
fi
done
echo ""
}
# Function to compare file lists between branches
compare_branches() {
echo "=== Files only in pr-1 (not in preprod) ==="
git diff --name-only preprod pr-1 | head -20
echo ""
echo "=== Files only in preprod (not in pr-1) ==="
git diff --name-only pr-1 preprod | head -20
echo ""
echo "=== Files that differ between branches ==="
git diff --name-status preprod pr-1 | head -20
echo ""
}
# Check if we're in a git repository
if ! git rev-parse --git-dir > /dev/null 2>&1; then
echo "Error: Not in a git repository"
exit 1
fi
# Check if branches exist
if ! git show-ref --verify --quiet refs/heads/pr-1; then
echo "Error: Branch 'pr-1' does not exist"
exit 1
fi
if ! git show-ref --verify --quiet refs/heads/preprod; then
echo "Error: Branch 'preprod' does not exist"
exit 1
fi
# Find large files in each branch
find_large_files "pr-1"
find_large_files "preprod"
# Compare branches
compare_branches
echo "=== Summary ==="
echo "Total objects in pr-1:"
git rev-list --objects pr-1 | wc -l
echo "Total objects in preprod:"
git rev-list --objects preprod | wc -l
echo ""
echo "Repository size breakdown:"
git count-objects -vH

Git Analysis Scripts Documentation

A collection of bash scripts for analyzing git repository sizes, branch differences, and identifying large files that may be causing slow builds or repository bloat.

Scripts Overview

1. git-branch-analysis.sh

Purpose: Comprehensive analysis of git branches with detailed statistics and comparisons.

Features:

  • Git objects analysis (repository size, pack files, largest objects)
  • Branch-specific statistics (commit count, file types, directory analysis)
  • Branch difference analysis (added, deleted, modified files)
  • Size difference calculations for modified files

Usage:

./git-branch-analysis.sh

Requirements:

  • Must be run from within a git repository
  • Requires bc command for calculations
  • Analyzes pr-1 and preprod branches by default

Output Sections:

  • Repository size and pack file information
  • Largest objects in the repository (>100KB)
  • Statistics for both pr-1 and preprod branches
  • File type distribution and directory analysis
  • Detailed comparison between branches

2. audit-branch-sizes.sh

Purpose: Audit and compare the actual file system sizes of different git branches.

Features:

  • Checks out branches to analyze actual file sizes
  • Provides git object size information
  • Lists largest files in each branch
  • Shows file count by type
  • Automatically returns to original branch

Usage:

./audit-branch-sizes.sh

Requirements:

  • Must be run from within a git repository
  • Will temporarily checkout different branches
  • Requires pr-1 and preprod branches to exist

Output Sections:

  • Total file system size for each branch
  • Git object size information
  • Top 20 largest files in each branch
  • File count distribution by file type

3. find-large-files.sh

Purpose: Identify large files within git branches without checking them out.

Features:

  • Lists largest files in each branch using git objects
  • Compares file lists between branches
  • Shows files unique to each branch
  • Provides repository size breakdown
  • Human-readable file size formatting

Usage:

./find-large-files.sh

Requirements:

  • Must be run from within a git repository
  • Requires bc command for size calculations
  • Both pr-1 and preprod branches must exist

Output Sections:

  • Top 20 largest files in each branch
  • Files only in pr-1 (not in preprod)
  • Files only in preprod (not in pr-1)
  • Files that differ between branches
  • Total object counts and repository size summary

Common Use Cases

Investigating Slow Builds

  1. Run git-branch-analysis.sh to get an overview of repository size and branch differences
  2. Use find-large-files.sh to identify specific large files that might be slowing down builds
  3. Run audit-branch-sizes.sh to see actual file system impact

Repository Cleanup

  1. Use find-large-files.sh to identify candidates for removal or .gitignore
  2. Check git-branch-analysis.sh output for file types that shouldn't be in the repository
  3. Use the branch comparison features to understand what's been added recently

Branch Comparison

All scripts provide different perspectives on branch differences:

  • git-branch-analysis.sh: Focuses on git object differences and statistics
  • audit-branch-sizes.sh: Shows actual file system size impact
  • find-large-files.sh: Identifies specific large files and their distribution

Prerequisites

System Requirements

  • Bash shell
  • Git repository
  • bc command (for mathematical calculations)
  • Standard Unix utilities: du, find, sort, uniq, wc

Repository Requirements

  • Must be run from within a git repository
  • Branches pr-1 and preprod must exist
  • Repository should have commit history

Customization

Changing Target Branches

To analyze different branches, modify the branch names in the scripts:

  • In git-branch-analysis.sh: Lines 55-56, 88, 90
  • In audit-branch-sizes.sh: Lines 45-46
  • In find-large-files.sh: Lines 51-58, 62-63

Adjusting Output Limits

  • Change head -20 to head -N for different numbers of results
  • Modify size thresholds in the scripts (e.g., 100000 bytes in git-branch-analysis.sh)

Adding New Analysis

Each script is modular with separate functions that can be extended or modified independently.

Troubleshooting

Common Issues

  1. "Not in a git repository": Ensure you're running the script from within a git repository
  2. "Branch does not exist": Create the required branches or modify the script to use existing branch names
  3. "bc: command not found": Install the bc calculator package
  4. Permission denied: Make scripts executable with chmod +x *.sh

Performance Considerations

  • audit-branch-sizes.sh is the slowest as it checks out branches
  • Large repositories may take significant time to analyze
  • Consider running scripts on smaller test repositories first
#!/bin/bash
# Comprehensive git branch analysis
echo "=== Git Branch Analysis ==="
# Function to analyze git objects and sizes
analyze_git_objects() {
echo "=== Git Objects Analysis ==="
echo "Repository size:"
du -sh .git
echo ""
echo "Pack file sizes:"
find .git/objects/pack -name "*.pack" -exec du -sh {} \; 2>/dev/null
echo ""
echo "Largest objects in repository:"
git verify-pack -v .git/objects/pack/*.idx 2>/dev/null | sort -k 3 -nr | head -20 | while read sha1 type size compressed offset depth base; do
if [ ! -z "$size" ] && [ "$size" -gt 100000 ]; then
obj_name=$(git rev-list --objects --all | grep "^$sha1" | cut -d' ' -f2-)
size_mb=$(echo "scale=2; $size/1048576" | bc -l 2>/dev/null || echo "${size}B")
echo " ${size_mb}MB - $obj_name"
fi
done
}
# Function to show branch-specific statistics
branch_stats() {
local branch=$1
echo "=== Branch Statistics: $branch ==="
echo "Commit count:"
git rev-list --count $branch
echo ""
echo "Branch size (approximate):"
git ls-tree -r -t -l $branch | awk '{sum+=$4} END {print "Total size: " sum/1048576 " MB"}'
echo ""
echo "File types and counts:"
git ls-tree -r --name-only $branch | sed 's/.*\.//' | sort | uniq -c | sort -nr | head -10
echo ""
echo "Directories with most files:"
git ls-tree -r --name-only $branch | sed 's/\/[^\/]*$//' | sort | uniq -c | sort -nr | head -10
echo ""
}
# Function to find what's different between branches
find_differences() {
echo "=== Branch Differences ==="
echo "Files added in pr-1 (not in preprod):"
git diff --name-only --diff-filter=A preprod pr-1 | head -20
echo ""
echo "Files deleted in pr-1 (present in preprod):"
git diff --name-only --diff-filter=D preprod pr-1 | head -20
echo ""
echo "Files modified between branches:"
git diff --name-only --diff-filter=M preprod pr-1 | head -20
echo ""
echo "Size difference of modified files:"
git diff --name-only --diff-filter=M preprod pr-1 | head -10 | while read file; do
if [ -f "$file" ]; then
pr1_size=$(git show pr-1:"$file" 2>/dev/null | wc -c)
preprod_size=$(git show preprod:"$file" 2>/dev/null | wc -c)
diff_size=$((pr1_size - preprod_size))
if [ $diff_size -ne 0 ]; then
echo " $file: ${diff_size} bytes difference"
fi
fi
done
}
# Main execution
if ! git rev-parse --git-dir > /dev/null 2>&1; then
echo "Error: Not in a git repository"
exit 1
fi
analyze_git_objects
echo ""
branch_stats "pr-1"
echo ""
branch_stats "preprod"
echo ""
find_differences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment