A collection of bash scripts for analyzing git repository sizes, branch differences, and identifying large files that may be causing slow builds or repository bloat.
Purpose: Comprehensive analysis of git branches with detailed statistics and comparisons.
Features:
- Git objects analysis (repository size, pack files, largest objects)
- Branch-specific statistics (commit count, file types, directory analysis)
- Branch difference analysis (added, deleted, modified files)
- Size difference calculations for modified files
Usage:
./git-branch-analysis.sh
Requirements:
- Must be run from within a git repository
- Requires
bc
command for calculations - Analyzes
pr-1
andpreprod
branches by default
Output Sections:
- Repository size and pack file information
- Largest objects in the repository (>100KB)
- Statistics for both
pr-1
andpreprod
branches - File type distribution and directory analysis
- Detailed comparison between branches
Purpose: Audit and compare the actual file system sizes of different git branches.
Features:
- Checks out branches to analyze actual file sizes
- Provides git object size information
- Lists largest files in each branch
- Shows file count by type
- Automatically returns to original branch
Usage:
./audit-branch-sizes.sh
Requirements:
- Must be run from within a git repository
- Will temporarily checkout different branches
- Requires
pr-1
andpreprod
branches to exist
Output Sections:
- Total file system size for each branch
- Git object size information
- Top 20 largest files in each branch
- File count distribution by file type
Purpose: Identify large files within git branches without checking them out.
Features:
- Lists largest files in each branch using git objects
- Compares file lists between branches
- Shows files unique to each branch
- Provides repository size breakdown
- Human-readable file size formatting
Usage:
./find-large-files.sh
Requirements:
- Must be run from within a git repository
- Requires
bc
command for size calculations - Both
pr-1
andpreprod
branches must exist
Output Sections:
- Top 20 largest files in each branch
- Files only in
pr-1
(not inpreprod
) - Files only in
preprod
(not inpr-1
) - Files that differ between branches
- Total object counts and repository size summary
- Run
git-branch-analysis.sh
to get an overview of repository size and branch differences - Use
find-large-files.sh
to identify specific large files that might be slowing down builds - Run
audit-branch-sizes.sh
to see actual file system impact
- Use
find-large-files.sh
to identify candidates for removal or.gitignore
- Check
git-branch-analysis.sh
output for file types that shouldn't be in the repository - Use the branch comparison features to understand what's been added recently
All scripts provide different perspectives on branch differences:
git-branch-analysis.sh
: Focuses on git object differences and statisticsaudit-branch-sizes.sh
: Shows actual file system size impactfind-large-files.sh
: Identifies specific large files and their distribution
- Bash shell
- Git repository
bc
command (for mathematical calculations)- Standard Unix utilities:
du
,find
,sort
,uniq
,wc
- Must be run from within a git repository
- Branches
pr-1
andpreprod
must exist - Repository should have commit history
To analyze different branches, modify the branch names in the scripts:
- In
git-branch-analysis.sh
: Lines 55-56, 88, 90 - In
audit-branch-sizes.sh
: Lines 45-46 - In
find-large-files.sh
: Lines 51-58, 62-63
- Change
head -20
tohead -N
for different numbers of results - Modify size thresholds in the scripts (e.g., 100000 bytes in
git-branch-analysis.sh
)
Each script is modular with separate functions that can be extended or modified independently.
- "Not in a git repository": Ensure you're running the script from within a git repository
- "Branch does not exist": Create the required branches or modify the script to use existing branch names
- "bc: command not found": Install the
bc
calculator package - Permission denied: Make scripts executable with
chmod +x *.sh
audit-branch-sizes.sh
is the slowest as it checks out branches- Large repositories may take significant time to analyze
- Consider running scripts on smaller test repositories first