Skip to content

Instantly share code, notes, and snippets.

@tsabat
Last active December 20, 2015 12:49
Show Gist options
  • Save tsabat/6133601 to your computer and use it in GitHub Desktop.
Save tsabat/6133601 to your computer and use it in GitHub Desktop.
Find huge files in git repo

All the credit for this goes to this so solution

#!/usr/bin/env ruby -w
head, treshold = ARGV
head ||= 'HEAD'
Megabyte = 1000 ** 2
treshold = (treshold || 0.1).to_f * Megabyte

big_files = {}

IO.popen("git rev-list #{head}", 'r') do |rev_list|
  rev_list.each_line do |commit|
    commit.chomp!
    for object in `git ls-tree -zrl #{commit}`.split("\0")
      bits, type, sha, size, path = object.split(/\s+/, 5)
      size = size.to_i
      big_files[sha] = [path, size, commit] if size >= treshold
    end
  end
end

big_files.each do |sha, (path, size, commit)|
  where = `git show -s #{commit} --format='%h: %cr'`.chomp
  puts "%4.1fM\t%s\t(%s)" % [size.to_f / Megabyte, path, where]
end

usage:

ruby big_file.rb [rev] [size in MB]
$ ruby big_file.rb master 0.3
3.8M  example/blah.psd  (aad2981: 4 months ago)
1.1M  another/big.file  (6e73ca2: 2 weeks ago)

Once you've located the file, you can remove it with these instructions.

If you have not pushed the file to the remove, you can remove it from your git history like so:

$ git reflog expire --expire=1.minute refs/heads/master
     # all deletions up to 1 minute  ago available to be garbage-collected
$ git fsck --unreachable 
     # lists all the blobs(file contents) that will be garbage-collected 
$ git prune 
$ git gc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment