Created
February 15, 2012 22:18
-
-
Save LukasKnuth/1839424 to your computer and use it in GitHub Desktop.
This Python script can be used as a "pre-commit"-hook, to check if a huge binary file got accidentally added to the staging area (and is about to be committed). Because deleting those afterwards is a huge pain in the ass...
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- DESCRIPTION -- | |
If you accidentally commit a huge file, you have a problem. Sure, you can remove it from the working tree and commit, | |
but the file is still reachable from your history and therefore causes every clone to be as huge as the commented | |
binary file. | |
Fixing this can be very ugly, time consuming and might not even work as you wish. Luckily, this script can protect | |
you from committing such monsters in the first place. | |
It looks through the staged files (the ones that are added with the "git add"-command) and checks for their file-size. | |
If they are larger then the given size, the commit is aborted and you get a message telling you what file takes so | |
much space so you can remove it from the staging area. | |
The script only checks the staged files, so having large files in your project-tree is not a problem, as long as | |
they don't get added to the staging-area. | |
-- USAGE -- | |
--- LINUX --- | |
If you're on Linux (or any other *nix system, like FreeBSD), you'll want to change the SHEBANG [1] to point to | |
your python interpreter. To get the correct path, you can use the "which"-command: | |
which python | |
Also, you'll want to change the "git_binary_path"-variable to point to your git-executable. To find that path, | |
you can use "which" as well. | |
Last but not least, you'll want to specify the maximum size for a file in the "max_file_size"-variable. All | |
files which are added to the staging area and are larger then the specified value will cause the commit to be | |
aborted. The size is specified in KB, so if you want to allow files which are smaller then 1.2 MB you'll write: | |
max_file_size = 1228 # 1024 x 1.2 | |
After personalizing the script, you'll want to tell git to use it. To do so, copy the script file to your | |
".git/hooks"-directory in the repo-root and rename it to "pre-commit" (no file ending). Then, you'll have to | |
mark it executable by using "chmod": | |
chmod +x pre-commit # in your "hooks"-directory | |
Now, if you run "git commit", the script will run and check for huge files in your staging-area. | |
--- WINDOWS --- | |
If you're working with Windows (and "msysgit"), it's a little more complicated. Since "msysgit" seams to have | |
a problem handling the SHEBANG [1], you'll have to use a little trick to make the script executable | |
(further information on this problem can be found here [2]). | |
In order to make the script work, you'll want to remove the SHEBANG from the Python script ("pre-commit.py") | |
and use a wrapper bash-script to call the interpreter. This script should look something like this: | |
#!/bin/sh | |
python .git/hooks/pre-commit.py | |
Store this script as a file called "pre-commit" (no file-ending). This assumes that you have Python in your | |
PATH [3]. If you don't, you can also specify the full path to your interpreter-executable. | |
This script will be called by "git commit" and call the python-script to check for the huge files. The path | |
after the SHEBANG should not be changed, as "msysgit" will remap it automatically. You must specify a path | |
relative to the repo-root for the Python script to be executed (because thats from where the script is called). | |
Afterwards you'll want to copy both the wrapper-file ("pre-commit") and the Python-script ("pre-commit.py") to | |
your repos ".git/hooks"-directory, personalize the Python-script ("max_file_size" and "git_binary_path") and | |
mark the "pre-commit"-file executable (see the Linux instructions). | |
--- MAC OS X --- | |
The instructions should be the same as for Linux. | |
-- LINKS -- | |
[1] http://en.wikipedia.org/wiki/Shebang_Unix | |
[2] http://stackoverflow.com/questions/1547005 | |
[3] http://en.wikipedia.org/wiki/Environment_variable#Examples_of_Unix_environment_variables |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
""" | |
This is a git commit-hook which can be used to check if huge files | |
where accidentally added to the staging area and are about to be | |
committed. | |
If there is a file which is bigger then the given "max_file_size"- | |
variable, the script will exit non-zero and abort the commit. | |
This script is meant to be added as a "pre-commit"-hook. See this | |
page for further information: | |
http://progit.org/book/ch7-3.html#installing_a_hook | |
In order to make the script work probably, you'll need to set the | |
above path to the python interpreter (first line of the file) | |
according to your system (under *NIX do "which python" to find out). | |
Also, the "git_binary_path"-variable should contain the absolute | |
path to your "git"-executable (you can use "which" here, too). | |
See the included README-file for further information. | |
The script was developed and has been confirmed to work under | |
python 3.2.2 and git 1.7.7.1 (might also work with earlier versions!) | |
""" | |
# The maximum file-size for a file to be committed: | |
max_file_size = 512 # in KB (= 1024 byte) | |
# The path to the git-binary: | |
git_binary_path = "/usr/bin/git" | |
# ---- DON'T CHANGE THE REST UNLESS YOU KNOW WHAT YOU'RE DOING! ---- | |
import subprocess, sys, os | |
""" | |
This function will return a human-readable filesize-string | |
like "3.5 MB" for it's given 'num'-parameter. | |
From http://stackoverflow.com/questions/1094841 | |
""" | |
def sizeof_fmt(num): | |
for x in ['bytes','KB','MB','GB','TB']: | |
if num < 1024.0: | |
return "%3.1f %s" % (num, x) | |
num /= 1024.0 | |
# Now, do the checking: | |
try: | |
print("Checking for files bigger then "+sizeof_fmt(max_file_size*1024)) | |
# Check all files in the staging-area: | |
text = subprocess.check_output( | |
[git_binary_path, "status", "--porcelain", "-uno"], | |
stderr=subprocess.STDOUT).decode("utf-8") | |
file_list = text.splitlines() | |
# Check all files: | |
for file_s in file_list: | |
stat = os.stat(file_s[3:]) | |
if stat.st_size > (max_file_size*1024): | |
# File is to big, abort the commit: | |
print("'"+file_s[3:]+"' is too huge to be commited!", | |
"("+sizeof_fmt(stat.st_size)+")") | |
sys.exit(1) | |
# Everything seams to be okay: | |
print("No huge files found.") | |
sys.exit(0) | |
except subprocess.CalledProcessError: | |
# There was a problem calling "git status". | |
print("Oops...") | |
sys.exit(12) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You also want to adjust this for when you remove a file from the repo, even a tiny file, you get an error.