Skip to content

Instantly share code, notes, and snippets.

@ageron
Created February 23, 2022 01:59
Show Gist options
  • Save ageron/c9ad56eb8c86e439dc2e792938c0e994 to your computer and use it in GitHub Desktop.
Save ageron/c9ad56eb8c86e439dc2e792938c0e994 to your computer and use it in GitHub Desktop.
A bash script that removes all Jupyter notebook outputs from a git repository's history
#!/bin/bash -e
echo 'Usage:
1. install git-filter-repo from https://github.com/newren/git-filter-repo
2. make a clean clone of your repository (keep a backup!)
3. cd to the repo and run this script
'
python -c 'input("Are you sure? Press Ctrl-C to cancel or Enter to continue.")'
git filter-repo --blob-callback '
import json
try:
notebook = json.loads(blob.data)
if "cells" in notebook:
for cell in notebook["cells"]:
if "outputs" in cell:
cell["outputs"] = []
blob.data = (json.dumps(notebook, ensure_ascii=False, indent=1,
sort_keys=True) + "\n").encode("utf-8")
except json.JSONDecodeError as ex:
pass
except UnicodeDecodeError as ex:
pass
'
@Fraydoesoulchild
Copy link

M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment