Skip to content

Instantly share code, notes, and snippets.

@4sskick
Created October 18, 2025 18:20
Show Gist options
  • Save 4sskick/8e9a723acbe4d8e222f038712d8eb992 to your computer and use it in GitHub Desktop.
Save 4sskick/8e9a723acbe4d8e222f038712d8eb992 to your computer and use it in GitHub Desktop.
Git Repository Cleanup & Large Object Reduction Report

🧠 Context

While maintaining a long-lived production repository, I noticed an unusually large .git/objects/pack file consuming hundreds of MBs.
This was affecting cloning speed, CI performance, and overall repository health.


🧪 Investigation

  1. Checked .git folder size:

    du -sh .git
  2. Verified large pack files inside .git/objects/pack:

    ls -lh .git/objects/pack
  3. Analyzed top largest Git objects:

    git verify-pack -v .git/objects/pack-xxxx.pack | sort -k3 -n | tail -20
  4. Mapped object hashes to actual file paths:

    git rev-list --objects --all | grep <hash>

⚙️ Cleanup Process

A Python automation script was created to:

  • Identify large Git objects (>10MB)
  • Map them to their actual file paths
  • Confirm user intent
  • Execute git filter-repo cleanup
  • Repack repository using git gc --aggressive
  • Measure before/after repo size
python analyze-pack.py --cleanup

When the repo had unstaged changes or wasn’t a fresh clone, the script smartly prompted for --force cleanup if safe.


🧩 Key Commands Used

git filter-repo --invert-paths --path <large-file-path>
git gc --prune=now --aggressive
git remote add origin <your-remote-url>
git push origin main --force

📦 Result

  • Initial size: 769.58 MB
  • Final size: ~220 MB
  • Reduction: 71% smaller! 🎯
  • History cleaned from large assets like .zip, .sql, .psd, .rar files.

💡 Lessons Learned

  • Always inspect .git/objects/pack when repo grows unexpectedly.
  • Use git filter-repo (not BFG) for safe & modern cleanup.
  • Never forget: after git filter-repo, your remotes are removed — re-add them manually!
  • Automating cleanup ensures consistent reproducibility in DevOps pipelines.

🧭 Takeaway for Teams

Regular repository hygiene saves:

  • Developer time (faster clone & fetch)
  • CI/CD cost (smaller artifacts)
  • Fewer headaches in the future 🚀

🧰 Recommended Tools

  • git filter-repo
  • du, grep, awk
  • pyenv + Python script automation

#GitOptimization #SoftwareEngineering #DevOps #CleanCode #PythonAutomation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment