Monday 16 July 2018

How to use nbconvert as git textconv driver to enable effective version control of Jupyter Notebooks

What I'm trying to do and how it differs from similar problems

I would like to version control Jupyter Notebooks using Git. Unfortunately, by default, Git and Jupyter Notebooks do not play nicely. An .ipynb file is a .json file containing not only the Python code itself but also plenty of metadata (e.g., cell execution counts) and cell output.

Most existing solutions (e.g., Using IPython notebooks under version control) rely on removing output and metadata from the notebook. This (i) still maintains the .json file structure when diffing, which is a pain to read, and (ii) means that features such as output display on Github cannot be used, because the output gets removed before committing.

My idea is the following: Whenever I run git diff, Git automatically uses jupyter nbconvert --to python filename.ipynb to convert from my *.ipynb source files to *.py plain python files. It should then only detect changes that affect the code itself (not execution counts and output, as those are removed by nbconvert) without actually removing them and it should make my diffs much more readable than they are for unconverted .ipynb files. I do not want the .py version of the file to be stored permanently; it should only be used for git diff. My understanding is that this should be possible by simply specifying nbconvert as the [diff] textconv driver, but I have not been able to get it to work.

Steps I have performed so far

I have created a file named ipynb2py in /usr/local/bin containing

#!/bin/bash
jupyter nbconvert --to python $1

I have added the following to my .gitconfig file

[diff "ipynb"]
    textconv = ipynb2py

and the following to my .gitattributes file

*.ipynb diff=ipynb

to assign the ipynb textconv driver to all files of the .ipynb format.

Now, I would expect git diff to automatically perform a conversion (I know this will slow down substantially but it's worth having a viable option for VCing notebooks) every time I run it and then show a nice readable diff, based only on the difference between notebook states after conversion.

When I do a git diff, it first says [NbConvertApp] Converting notebook, which tells me that Git is triggering the conversion as expected. However, the conversion fails after a long Python traceback ending in fatal: unable to read files to diff.

Immediately before the fatal error message, I receive the following

nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...

Of course, I suspected that there was a problem with the way in which my ipynb2py script was invoking nbconvert, but running ipynb2py notebook.ipynb in my repo works perfectly well, so that cannot be the reason.

What could be causing this error? What are the requirements for a valid textconv driver other than returning a text file?

Complete traceback

git diff
[NbConvertApp] Converting notebook /var/folders/9t/p55_4b9971j4wwp14_45wy900000gn/T//lR5q08_notebook.ipynb to python
Traceback (most recent call last):
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 14, in parse_json
nb_dict = json.loads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/user/anaconda/bin/jupyter-nbconvert", line 11, in <module>
load_entry_point('nbconvert==5.1.1', 'console_scripts', 'jupyter-nbconvert')()
File "/Users/user/anaconda/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 305, in start
self.convert_notebooks()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 473, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 444, in convert_single_notebook
output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 373, in export_single_notebook
output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 171, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 189, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 141, in read
return reads(fp.read(), as_version, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 74, in reads
nb = reader.reads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 58, in reads
nb_dict = parse_json(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 17, in parse_json
raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...")
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...
fatal: unable to read files to diff



from How to use nbconvert as git textconv driver to enable effective version control of Jupyter Notebooks

No comments:

Post a Comment