modular jupyter notebooks
I’ve been working on an effective way to use Jupyter Notebooks as Python modules.
This is the subject of my thesis for my master’s degree in software engineering, which I will receive on May 27. So, yay!
- For the impatient, check out a live demo here. I will refer to this link later on in the post as well.
One thing that’s always bothered me about Jupyter Notebooks is that you can’t
reuse the code from one notebook inside another notebook. This makes it
impossible to organize the code into notebooks that each focus on one specific
part of a methodology. The standard practice is to move reusable code out of
notebooks and into .py
files, but that’s not good enough for me: If I’m using
Jupyter Notebooks to author code, presumably I’ve chosen that format for a
reason – why should I have to give that up?
Yes, there are tools that allow you export your notebooks as .py
files, which
can then be used as modules. There are also some proofs of concept demonstrating
how it is technically possible to extend Python’s import
mechanism to load a
.ipynb
file as if it were a .py
file. But both approaches fail to recognize
that code is written differently in Jupyter notebooks, so running every cell in
a notebook from top to bottom doesn’t really make sense. Notebooks may contain
scratch code or demo code that makes it a very bad source code module.
So, my solution was to create a syntax that allows you to control which cells of a notebook will be include in a notebook’s module view. The syntax looks like this:
# :: ignore-cell ::
# This cell will be ignored during `import`
print("Hello!")
I realized that it’s generally useful to be able to attach directives like this, as well as variable assignments, to notebooks in a standard way that other application can make use of if they are designed to. I call these special annotations “margin notes,” and I call the syntax I have developed “Margo.”
In the above example, the application that makes use of the ignore-cell
margin
note is the custom import extension that I built, called
margo-loader. It also
understands directives like module-stop
and module-start
to ignore entire
blocks of cells, as well as not-a-module
to prevent a notebook from being
imported using margo loader.
For a tutorial on modular notebooks and margin notes more generally, check out this repo, which can be run in a browser without installing any software.
Beyond modular notebooks
We’ve seen a directive with no assignment. In Margo, you can assign arbitrary values using a few different syntaxes. The basic one looks like a JSON array without the enclosing brackets. It accepts the same non-collection scalar values that JSON does.
# :: values: "one", 1, true, null ::
You can also define values using JSON and YAML or as plain text, as shown in the following three examples. Multiline strings are welcome with no special escaping:
# :: hello_world [raw] : '
# :: This is a raw text Margo Value
# :: ------------------------------
# ::
# :: This can be handy for multiline string values.
# :: '
# :: interface [json]: '{
# :: "input": "population.csv",
# :: "output": "report.pdf", "report.html"
# :: }' ::
# :: interface [yaml]: '
# :: input: population.csv
# :: output:
# :: - report.pdf
# :: - report.html
# :: ' ::
While I originally conceived of these annotations as a way to describe which
cells to ignore, I realized they are generally useful as a way to extend the
Jupyter Notebook format in a non-breaking way. Particularly, they can be used as
a way to create new “interfaces” into a Jupyter Notebook. In the notebook
tutorial liked above, I demonstrate this by encoding a notebook’s
requirements.txt
inside margin notes. In my thesis, I also use margin notes to
describe a Notebook’s Makefile interface (the files that must exist prior to the
notebook running, and the files that will be generated by running it). By
defining a Makefile interface, it is possible to orchestrate the execution of
notebooks as a DAG using make
. In the final part of my thesis, I built a
Notebook editor prototype that supports heirarchical cell
relationships, which are encoded in the
document as margin notes.
Links
- Margo organization page on GitHub: https://github.com/margo-notebooks/
- Margo loader: https://github.com/margo-notebooks/margo-loader-py
- Margo parser: https://github.com/margo-notebooks/margo-parser-py
- Tutorial repo: https://github.com/margo-notebooks/modular-notebook-tutorial
- Tutorial live demo: https://mybinder.org/v2/gh/margo-notebooks/modular-notebook-tutorial/HEAD