jake kara, software engineer ‣ modular jupyter notebooks ░

modular jupyter notebooks

I’ve been working on an effective way to use Jupyter Notebooks as Python modules.

This is the subject of my thesis for my master’s degree in software engineering, which I will receive on May 27. So, yay!

One thing that’s always bothered me about Jupyter Notebooks is that you can’t reuse the code from one notebook inside another notebook. This makes it impossible to organize the code into notebooks that each focus on one specific part of a methodology. The standard practice is to move reusable code out of notebooks and into .py files, but that’s not good enough for me: If I’m using Jupyter Notebooks to author code, presumably I’ve chosen that format for a reason – why should I have to give that up?

Yes, there are tools that allow you export your notebooks as .py files, which can then be used as modules. There are also some proofs of concept demonstrating how it is technically possible to extend Python’s import mechanism to load a .ipynb file as if it were a .py file. But both approaches fail to recognize that code is written differently in Jupyter notebooks, so running every cell in a notebook from top to bottom doesn’t really make sense. Notebooks may contain scratch code or demo code that makes it a very bad source code module.

So, my solution was to create a syntax that allows you to control which cells of a notebook will be include in a notebook’s module view. The syntax looks like this:

# :: ignore-cell ::
# This cell will be ignored during `import`

print("Hello!")

I realized that it’s generally useful to be able to attach directives like this, as well as variable assignments, to notebooks in a standard way that other application can make use of if they are designed to. I call these special annotations “margin notes,” and I call the syntax I have developed “Margo.”

In the above example, the application that makes use of the ignore-cell margin note is the custom import extension that I built, called margo-loader. It also understands directives like module-stop and module-start to ignore entire blocks of cells, as well as not-a-module to prevent a notebook from being imported using margo loader.

For a tutorial on modular notebooks and margin notes more generally, check out this repo, which can be run in a browser without installing any software.

Beyond modular notebooks

We’ve seen a directive with no assignment. In Margo, you can assign arbitrary values using a few different syntaxes. The basic one looks like a JSON array without the enclosing brackets. It accepts the same non-collection scalar values that JSON does.

# :: values: "one", 1, true, null ::

You can also define values using JSON and YAML or as plain text, as shown in the following three examples. Multiline strings are welcome with no special escaping:

# :: hello_world [raw] : '
# :: This is a raw text Margo Value
# :: ------------------------------
# :: 
# :: This can be handy for multiline string values.
# :: '
# :: interface [json]: '{
# :: "input": "population.csv",
# :: "output": "report.pdf", "report.html"
# :: }' ::
# :: interface [yaml]: '
# :: input: population.csv
# :: output:
# ::   - report.pdf
# ::   - report.html
# :: ' ::

While I originally conceived of these annotations as a way to describe which cells to ignore, I realized they are generally useful as a way to extend the Jupyter Notebook format in a non-breaking way. Particularly, they can be used as a way to create new “interfaces” into a Jupyter Notebook. In the notebook tutorial liked above, I demonstrate this by encoding a notebook’s requirements.txt inside margin notes. In my thesis, I also use margin notes to describe a Notebook’s Makefile interface (the files that must exist prior to the notebook running, and the files that will be generated by running it). By defining a Makefile interface, it is possible to orchestrate the execution of notebooks as a DAG using make. In the final part of my thesis, I built a Notebook editor prototype that supports heirarchical cell relationships, which are encoded in the document as margin notes.

  1. Margo organization page on GitHub: https://github.com/margo-notebooks/
  2. Margo loader: https://github.com/margo-notebooks/margo-loader-py
  3. Margo parser: https://github.com/margo-notebooks/margo-parser-py
  4. Tutorial repo: https://github.com/margo-notebooks/modular-notebook-tutorial
  5. Tutorial live demo: https://mybinder.org/v2/gh/margo-notebooks/modular-notebook-tutorial/HEAD