-
-
Save minrk/6176788 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
"""strip outputs from an IPython Notebook | |
Opens a notebook, strips its output, and writes the outputless version to the original file. | |
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS. | |
This does mostly the same thing as the `Clear All Output` command in the notebook UI. | |
LICENSE: Public Domain | |
""" | |
import io | |
import sys | |
try: | |
# Jupyter >= 4 | |
from nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython 3 | |
try: | |
from IPython.nbformat import read, write, NO_CONVERT | |
except ImportError: | |
# IPython < 3 | |
from IPython.nbformat import current | |
def read(f, as_version): | |
return current.read(f, 'json') | |
def write(nb, f): | |
return current.write(nb, f, 'json') | |
def _cells(nb): | |
"""Yield all cells in an nbformat-insensitive manner""" | |
if nb.nbformat < 4: | |
for ws in nb.worksheets: | |
for cell in ws.cells: | |
yield cell | |
else: | |
for cell in nb.cells: | |
yield cell | |
def strip_output(nb): | |
"""strip the outputs from a notebook object""" | |
nb.metadata.pop('signature', None) | |
for cell in _cells(nb): | |
if 'outputs' in cell: | |
cell['outputs'] = [] | |
if 'prompt_number' in cell: | |
cell['prompt_number'] = None | |
return nb | |
if __name__ == '__main__': | |
filename = sys.argv[1] | |
with io.open(filename, 'r', encoding='utf8') as f: | |
nb = read(f, as_version=NO_CONVERT) | |
nb = strip_output(nb) | |
with io.open(filename, 'w', encoding='utf8') as f: | |
write(nb, f) | |
#!/bin/sh | |
# | |
# strip output of IPython Notebooks | |
# add this as `.git/hooks/pre-commit` | |
# to run every time you commit a notebook | |
# | |
# requires `nbstripout` to be available on your PATH | |
# | |
# LICENSE: Public Domain | |
if git rev-parse --verify HEAD >/dev/null 2>&1; then | |
against=HEAD | |
else | |
# Initial commit: diff against an empty tree object | |
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904 | |
fi | |
# Find notebooks to be committed | |
( | |
IFS=' | |
' | |
NBS=`git diff-index -z --cached $against --name-only | grep '.ipynb$' | uniq` | |
for NB in $NBS ; do | |
echo "Removing outputs from $NB" | |
nbstripout "$NB" | |
git add "$NB" | |
done | |
) | |
exec git diff-index --check --cached $against -- |
for the git filter, I made the following changes to nbstripout: https://github.com/cfriedline/ipynb_template/blob/master/nbstripout
nb = current.read(sys.stdin, 'json')
nb = strip_output(nb)
current.write(nb, sys.stdout, 'json')
To get it to work for me I hat to add -a
to grep and remove the $
in the git hook. Dunno why, but now it works.
NBS=`git diff-index -z --cached $against --name-only | grep -a '.ipynb' | uniq`
Line 23 of pre-commit
, one needs to replace
for NB in NBS
with
for NB in $NBS
and also to make the change of @sotte.
Any chance this will become part of nbconvert?
Is there any way to commit the stripped version but leave output in your working directory
On OS X 10.10, I couldn't get NBS=
git diff-index -z ... | grep ...` to work with null character separators, so here's one workaround in bash:
#!/bin/bash
...
(
pat='\.ipynb$'
while IFS= read -r -d '' file; do
if [[ "$file" =~ $pat ]]; then
printf 'Removing outputs from %q\n' "$file";
nbstripout "$file"
git add "$file"
fi
done < <(git diff-index -z --cached $against --name-only)
)
In nbstripout
I also made the following changes, though this probably depends on individual taste. Cell toggling isn't reset by clearing output in the notebook GUI, so toggle states may get versioned even if no output is present. Popping prompt_number matches notebook gui behavior (in IPython 2.4.1).
if 'prompt_number' in cell:
cell.pop('prompt_number')
if 'collapsed' in cell:
cell['collapsed'] = False
the pre-commit hook approach didn't work for me (the grep somehow found .py files, but only if there was a .ipynb in the commit..) but filter seems cleaner anyway. Here's what I did to get it working:
I modified cfriedline's nbstripout file slightly to give an informative error when you can't import the latest IPython:
https://github.com/petered/plato/blob/fb2f4e252f50c79768920d0e47b870a8d799e92b/notebooks/config/strip_notebook_output
And added it to my repo, lets say in ./relative/path/to/nbstripout
Also added the file .gitattributes file to the root of the repo, containing:
*.ipynb filter=stripoutput
And created a setup_git_filters.sh
containing
git config filter.stripoutput.clean "$(git rev-parse --show-toplevel)/relative/path/to/nbstripout"
git config filter.stripoutput.smudge cat
git config filter.stripoutput.required true
And ran source setup_git_filters.sh
. The fancy $(git rev-parse...) thing is to find the local path of your repo on any (Unix) machine.
Slightly modified method that works with the new notebook format (v4) used in iPython 3
https://gist.github.com/waylonflinn/010f0a1a66760adf914f
The essential difference is an added check for the presence of the worksheets
object on the root.
I've created a version that removes the whole cell. Although I have to admit the way I track the index is not at all optimal and there might be better ways making proper use of the API. Feedback welcome:
https://gist.github.com/dietmarw/dc0cf089d8d6211136d5
I have added documentation, an nbstripout install
command to install the filter in the current Git repository and turned it into a module with a setuptools script entry point: https://github.com/kynan/nbstripout
How do you feel about publishing that on PyPI @minrk?
I've adapted cfriedline's repo to make it easy to install to any repo as a filter https://github.com/jond3k/ipynb_stripout
@jond3k Have a look at my repo linked above: it works with v3 and v4 and has an install command to automate the installation in any git repo.
@kynan feel free to put it on PyPI. No need to wait for me.
@minrk OK, will do, thanks!
Great snippet, thanks a lot for sharing!
Two suggestions:
- Small fix: I guess it should be
grep '\.ipynb$'
with the.
escaped, else it will match anything - Also add
| tr -d '\000' |
before grep:NBS=`git diff-index -z --cached $against --name-only | tr -d '\000' | grep '\.ipynb$' | uniq
The second point is because there will be cases where grep considers the input binary (https://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary). This happens to me when using zsh
(i.e. getting Binary file (standard input) matches
from grep
instead of the matchiing parts)
Or as a git filter instead:
(from @JanShulz)
Add this to your .git/config:
and a .gitattributes file with