What I'm trying to do and how it differs from similar problems
I would like to version control Jupyter Notebooks using Git. Unfortunately, by default, Git and Jupyter Notebooks do not play nicely. An .ipynb
file is a .json
file containing not only the Python code itself but also plenty of metadata (e.g., cell execution counts) and cell output.
Most existing solutions (e.g., Using IPython notebooks under version control) rely on removing output and metadata from the notebook. This (i) still maintains the .json
file structure when diffing, which is a pain to read, and (ii) means that features such as output display on Github cannot be used, because the output gets removed before committing.
My idea is the following: Whenever I run git diff
, Git automatically uses jupyter nbconvert --to python filename.ipynb
to convert from my *.ipynb
source files to *.py
plain python files. It should then only detect changes that affect the code itself (not execution counts and output, as those are removed by nbconvert
) without actually removing them and it should make my diffs much more readable than they are for unconverted .ipynb
files. I do not want the .py
version of the file to be stored permanently; it should only be used for git diff
. My understanding is that this should be possible by simply specifying nbconvert
as the [diff] textconv
driver, but I have not been able to get it to work.
Steps I have performed so far
I have created a file named ipynb2py
in /usr/local/bin
containing
#!/bin/bash
jupyter nbconvert --to python $1
I have added the following to my .gitconfig
file
[diff "ipynb"]
textconv = ipynb2py
and the following to my .gitattributes
file
*.ipynb diff=ipynb
to assign the ipynb
textconv driver to all files of the .ipynb
format.
Now, I would expect git diff
to automatically perform a conversion (I know this will slow down substantially but it's worth having a viable option for VCing notebooks) every time I run it and then show a nice readable diff, based only on the difference between notebook states after conversion.
When I do a git diff
, it first says [NbConvertApp] Converting notebook
, which tells me that Git is triggering the conversion as expected. However, the conversion fails after a long Python traceback ending in fatal: unable to read files to diff
.
Immediately before the fatal
error message, I receive the following
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...
Of course, I suspected that there was a problem with the way in which my ipynb2py
script was invoking nbconvert
, but running ipynb2py notebook.ipynb
in my repo works perfectly well, so that cannot be the reason.
What could be causing this error? What are the requirements for a valid textconv
driver other than returning a text file?
Complete traceback
git diff
[NbConvertApp] Converting notebook /var/folders/9t/p55_4b9971j4wwp14_45wy900000gn/T//lR5q08_notebook.ipynb to python
Traceback (most recent call last):
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 14, in parse_json
nb_dict = json.loads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/user/anaconda/bin/jupyter-nbconvert", line 11, in <module>
load_entry_point('nbconvert==5.1.1', 'console_scripts', 'jupyter-nbconvert')()
File "/Users/user/anaconda/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 305, in start
self.convert_notebooks()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 473, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 444, in convert_single_notebook
output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 373, in export_single_notebook
output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 171, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 189, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 141, in read
return reads(fp.read(), as_version, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 74, in reads
nb = reader.reads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 58, in reads
nb_dict = parse_json(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 17, in parse_json
raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...")
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...
fatal: unable to read files to diff
from How to use nbconvert as git textconv driver to enable effective version control of Jupyter Notebooks
No comments:
Post a Comment