Skip to content

Instantly share code, notes, and snippets.

@brianspiering
Last active September 18, 2024 00:19
Show Gist options
  • Save brianspiering/337f68c4d826881dd8970222e114b382 to your computer and use it in GitHub Desktop.
Save brianspiering/337f68c4d826881dd8970222e114b382 to your computer and use it in GitHub Desktop.
Remove big files from git history (Data Scientists are silly)
# Remove data from git history
git clone --mirror https://github.com/brianspiering/intro_course_data_science_for_good-1.git
cd intro_course_data_science_for_good-1.git/
# What are we dealing with?
git count-objects -vH
# Use a silver bullet
# https://github.com/rtyley/bfg-repo-cleaner
bfg --delete-files *.csv
bfg --delete-files *.zip
bfg --strip-blobs-bigger-than 40M
git reflog expire --expire=now --all && git gc --prune=now --aggressive
# Better, same, or worse?
git count-objects -vH
# This line doesn't work as expected
# It does not update with reduced git history size
git push --force https://github.com/brianspiering/intro_course_data_science_for_good-1.git
# If that does not work, another option is delete all git history
# Inspired by https://gist.github.com/stephenhardy/5470814
# Remove the history from
rm -rf .git
# Recreate the repos from the current content only
git init --initial-branch=main
git add .
git commit -m "Initial commit"
# Push to the github remote repos ensuring you overwrite history
git remote add origin [email protected]:<YOUR ACCOUNT>/<YOUR REPOS>.git
git push -u --force origin main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment