Last active
June 6, 2023 06:23
-
-
Save minrk/6176788 to your computer and use it in GitHub Desktop.
git pre-commit hook for stripping output from IPython notebooks
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
"""strip outputs from an IPython Notebook | |
Opens a notebook, strips its output, and writes the outputless version to the original file. | |
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS. | |
This does mostly the same thing as the `Clear All Output` command in the notebook UI. | |
LICENSE: Public Domain | |
""" | |
import io | |
import sys | |
try: | |
# Jupyter >= 4 | |
from nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython 3 | |
try: | |
from IPython.nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython < 3 | |
from IPython.nbformat import current | |
def read(f, as_version): | |
return current.read(f, 'json') | |
def write(nb, f): | |
return current.write(nb, f, 'json') | |
def _cells(nb): | |
"""Yield all cells in an nbformat-insensitive manner""" | |
if nb.nbformat < 4: | |
for ws in nb.worksheets: | |
for cell in ws.cells: | |
yield cell | |
else: | |
for cell in nb.cells: | |
yield cell | |
def strip_output(nb): | |
"""strip the outputs from a notebook object""" | |
nb.metadata.pop('signature', None) | |
for cell in _cells(nb): | |
if 'outputs' in cell: | |
cell['outputs'] = [] | |
if 'prompt_number' in cell: | |
cell['prompt_number'] = None | |
return nb | |
if __name__ == '__main__': | |
filename = sys.argv[1] | |
with io.open(filename, 'r', encoding='utf8') as f: | |
nb = read(f, as_version=NO_CONVERT) | |
nb = strip_output(nb) | |
with io.open(filename, 'w', encoding='utf8') as f: | |
write(nb, f) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# | |
# strip output of IPython Notebooks | |
# add this as `.git/hooks/pre-commit` | |
# to run every time you commit a notebook | |
# | |
# requires `nbstripout` to be available on your PATH | |
# | |
# LICENSE: Public Domain | |
if git rev-parse --verify HEAD >/dev/null 2>&1; then | |
against=HEAD | |
else | |
# Initial commit: diff against an empty tree object | |
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904 | |
fi | |
# Find notebooks to be committed | |
( | |
IFS=' | |
' | |
NBS=`git diff-index -z --cached $against --name-only | grep '.ipynb$' | uniq` | |
for NB in $NBS ; do | |
echo "Removing outputs from $NB" | |
nbstripout "$NB" | |
git add "$NB" | |
done | |
) | |
exec git diff-index --check --cached $against -- |
Great snippet, thanks a lot for sharing!
Two suggestions:
- Small fix: I guess it should be
grep '\.ipynb$'
with the.
escaped, else it will match anything - Also add
| tr -d '\000' |
before grep:NBS=`git diff-index -z --cached $against --name-only | tr -d '\000' | grep '\.ipynb$' | uniq
The second point is because there will be cases where grep considers the input binary (https://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary). This happens to me when using zsh
(i.e. getting Binary file (standard input) matches
from grep
instead of the matchiing parts)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@minrk OK, will do, thanks!