an essay on the value and implementation of formal unit testing in jupyter notebooks
Forked from ericdatakelly/2021-testing-in-notebooks.ipynb
Last active
February 9, 2023 22:21
-
-
Save tonyfast/430faaf87c6d3607d80ec29cc7049043 to your computer and use it in GitHub Desktop.
an essay on the value and implementation of formal unit testing in jupyter notebooks
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "0ad97229", | |
"metadata": {}, | |
"source": [ | |
"# formal interactive notebook testing\n", | |
"\n", | |
"This post is for notebook authors who need to ship software, and anyone who wants to start learning more about testing in Python. We'll focus on Python's standard library unit testing tools interactively from the notebook. There are many flavors of testing, but our focus remains on unit testing to provide quantitative metrics about the fitness of program. Readers will leave understanding how to prototype and simulate writing unit tests in interactive notebooks.\n", | |
"\n", | |
"\n", | |
"A common motivation for using notebooks is to _test an idea_. Without formal conventions, notebooks can result in scatter-shot code that informally verifies an idea. In this post, we discuss how to mature informal notebooks into formal unit test conventions. With practice, an effective use of the notebook is to compose both code and formal tests that can be moved into your project's module and testing suite; you do have tests don't you?\n", | |
"\n", | |
"## why are formal tests valuable\n", | |
"\n", | |
"Tests are investments, and testing over time measures the return on investment. Testing promotes:\n", | |
"\n", | |
"* longevity of ideas\n", | |
"* protection from upstream changes\n", | |
"* value to you and consumers of your software\n", | |
"* health metrics when used in continuous deployment\n", | |
"\n", | |
"> learn more about the motivation for testing [The Hitchhiker's Guide to Python - Testing your code][hitchhiker]\n", | |
"\n", | |
"[hitchhiker]: https://docs.python-guide.org/writing/tests/ \"a very good resource for learning about testing for python programmers\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f09057d2", | |
"metadata": {}, | |
"source": [ | |
"## testing is standard\n", | |
"\n", | |
"Most programming languages come with unit testing abilities that allow authors to make formal assertions about the expectations of their code. In Python, [`doctest`] and [`unittest`] are builtin libraries that enable testing; meanwhile, `pytest` is the choice of popular projects in the broader Python community. You will not need extra dependecies besides a notebook interface and Python to apply the ideas from this post.\n", | |
"\n", | |
"> we will not discuss testing notebooks in `pytest` in this document, but if you want to read ahead you can look at [`nbval`], [`importnb`], or [`testbook`] for different flavors of notebook testing.\n", | |
"\n", | |
"[`doctest`]: https://docs.python.org/3/library/doctest.html \"the Python standard library documentation for doctest\"\n", | |
"[`unittest`]: https://docs.python.org/3/library/unittest.html \"the Python standard library documentation for unittest\"\n", | |
"[`pytest`]: https://docs.pytest.org/ \"the Python framework for running tests\"\n", | |
"[`nbval`]: https://nbval.readthedocs.io/en/latest/ \"a pytest extension for testing notebook inputs and outputs by excluding notebooks\"\n", | |
"[`importnb`]: https://github.com/deathbeds/importnb \"a pytest extension for extracting tests for notebooks as modules\"\n", | |
"\n", | |
"[`testbook`]: https://github.com/nteract/testbook \"testbook is a unit testing framework extension for testing code in notebooks\"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e6654126", | |
"metadata": {}, | |
"source": [ | |
"## Python `doctest`\n", | |
"\n", | |
"Documentation driven testing was [introduced into Python in 1999][ddt]. It introduced the ability to combine code and narrative into docstrings following a long lineage of literate programming concepts.\n", | |
"\n", | |
"[ddt]: https://groups.google.com/g/comp.lang.python/c/DfzH5Nrt05E/m/Yyd3s7fPVxwJ" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8e455ed2", | |
"metadata": {}, | |
"source": [ | |
"### Anatomy of a `doctest`\n", | |
"\n", | |
"The goal of a `doctest` is to compare the execution of a `line_of_code` with an `expected_result`. When a `doctest` is executed it executes the `line_of_code` and generates a `test_result`. When the `test_result` and the `expected_result` are the same a test has passed, otherwise a test has failed.\n", | |
"\n", | |
"The list below demonstrates some forms of `doctest`s in pseudocode representation.\n", | |
"\n", | |
"* a `doctest` with a single line of code and an expected result\n", | |
"\n", | |
" ```pycon\n", | |
" >>> {line_of_code}\n", | |
" {expected_result}\n", | |
" ```\n", | |
"\n", | |
"* a `doctest` with multiple lines of code and an expected result\n", | |
"\n", | |
" ```pycon\n", | |
" >>> {line_of_code}\n", | |
" ... {line_of_code}\n", | |
" ... line_of_code\n", | |
" {expected_result}\n", | |
" ```\n", | |
"\n", | |
"* a `doctest` with multiple lines of code that prints no output\n", | |
"\n", | |
" ```pycon\n", | |
" >>> {line_of_code}\n", | |
" ... {line_of_code}\n", | |
" ```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "62d10d1b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" import doctest" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6e8382eb", | |
"metadata": {}, | |
"source": [ | |
"We like `doctest` because it is the easiest way to run tests in notebooks. Below is a concrete example of a `doctest`:\n", | |
"\n", | |
"* line 4-6 represent the `line_of_code`\n", | |
"* line 7 is the `expected_result`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "9ae7a0bf", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" def a_simple_function_with_a_doctest(x):\n", | |
" \"\"\"this function turns every input into its string representation.\n", | |
"\n", | |
" >>> a_simple_function_with_a_doctest(\n", | |
" ... 1\n", | |
" ... )\n", | |
" '1'\n", | |
" \"\"\"\n", | |
" return str(x)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "fefd564a", | |
"metadata": {}, | |
"source": [ | |
"The easy invocation of a `doctest` in the notebook is what makes it the easiest test tool to use." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "75303bf5", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"TestResults(failed=0, attempted=1)" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" doctest.testmod()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "95d6e8d8", | |
"metadata": {}, | |
"source": [ | |
"When we invoke `doctest.testmod` it will look for all the functions and classes in our module that have docstrings and `doctest` examples. As a result, the previous example finds one test. The test results summarize the test success and failures. If the `test_result`, the execution of the `line_of_code`, matches the `expected_result` our tests succeed, otherwise they fail. When tests fail, we'll want to inspect both our tests and source code to discover the source of the failure.\n", | |
"\n", | |
"\n", | |
"> learn more about [`doctest` discovery]\n", | |
"\n", | |
"[`doctest` discovery]: https://docs.python.org/3/library/doctest.html#which-docstrings-are-examined \"a section about how doctest discovers tests\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1e4166ae", | |
"metadata": {}, | |
"source": [ | |
"Here we add a new class with another `doctest`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "77132872", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" class ASimpleClassWithADoctest(str):\n", | |
" \"\"\"this type turns every input into its string type.\n", | |
"\n", | |
" >>> ASimpleClassWithADoctest(1)\n", | |
" '1'\n", | |
" \"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "156f044b", | |
"metadata": {}, | |
"source": [ | |
"When we re-run the `doctest`s we notice that another test was discovered." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "9907af95", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"TestResults(failed=0, attempted=2)" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" doctest.testmod()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a51787c6", | |
"metadata": {}, | |
"source": [ | |
"There is one more way that `doctest` discovers tests, which is through the `__test__` variable. We can make a `__test__` dictionary with keys that name the tests, and values that are objects holding `doctest` syntaxes." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "86e99e72", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'a_test_without_a_function': '>>> assert a_simple_function_with_a_doctest(1) == ASimpleClassWithADoctest(1)'}" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" __test__ = dict(\n", | |
" a_test_without_a_function=\"\"\">>> assert a_simple_function_with_a_doctest(1) == ASimpleClassWithADoctest(1)\"\"\"\n", | |
" ); __test__" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "62953575", | |
"metadata": {}, | |
"source": [ | |
"Now the `doctest` finds three tests." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "4a2e7690", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"TestResults(failed=0, attempted=3)" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" doctest.testmod()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c91602fc", | |
"metadata": {}, | |
"source": [ | |
"## `unittest` when `doctest` isn't enough" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "de0630a3", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" import unittest" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e2238a50", | |
"metadata": {}, | |
"source": [ | |
"`doctest` are the easiest to invoke, but they can be difficult to write for some tests you may wish to run. Python provides the alternative `unittest` library that allows authors to write tests in pure Python, rather than strings. This approach to writing tests will be more familiar to new Python learners.\n", | |
"\n", | |
"`doctest` relies on the comparison of an `expected_result` and a `test_result` whereas `unittest` provides an extended interface for comparing items using their [list of assertion methods].\n", | |
"\n", | |
"[list of assertion methods]: https://docs.python.org/3/library/unittest.html#assert-methods \"A list of assertion method in unittest test cases.\"\n", | |
"\n", | |
"> learn more about the relationship between `doctest` and `unittest` [The Hitchhiker's Guide to Python - Testing your code][hitchhiker]\n", | |
"\n", | |
"[hitchhiker]: https://docs.python-guide.org/writing/tests/ \"blessed documentation comparing unit test approaches\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8bb09eb7", | |
"metadata": {}, | |
"source": [ | |
"Python unit tests subclass the `unittest.TestCase` type." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "63f1e5a2", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" class UnitTests(unittest.TestCase):\n", | |
" def test_simple_methods(self):\n", | |
" pass" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b7f6118d", | |
"metadata": {}, | |
"source": [ | |
"Running `unittest` in the notebook requires some keyword arguments that we'll wrap in a function to facilitate our discussion." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "43b7454c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" def run_unittest():\n", | |
" unittest.main(argv=[\"\"], exit=False)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4c0fc997", | |
"metadata": {}, | |
"source": [ | |
"When we invoke `run_unittest` we notice our test is discovered." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "6d09e635", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
".\n", | |
"----------------------------------------------------------------------\n", | |
"Ran 1 test in 0.001s\n", | |
"\n", | |
"OK\n" | |
] | |
} | |
], | |
"source": [ | |
"run_unittest()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ce33561f", | |
"metadata": {}, | |
"source": [ | |
"We've already written three other tests though. Wouldn't it be nice to include our `doctest`s in the test suite too?! If you dig deep into the `doctest` documentation you'll find the [`unittest` interface]. It demonstrates that including the `load_tests` function in the namespace means that `unittest` will know to discover our `doctest`s.\n", | |
"\n", | |
"It may appear that we are using multiple testing forms by combining `doctest` and `unittest`. It turns out that a `doctest` is a special test case that compares the output of string values. With experience, authors will find that some tests are easier to write as `doctest` and others are easier as `unittest` assertions. \n", | |
"\n", | |
"[`unittest` interface]: https://docs.python.org/3/library/doctest.html#unittest-api" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "19aa898e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def load_tests(loader, tests, ignore):\n", | |
" tests.addTests(doctest.DocTestSuite(__import__(__name__)))\n", | |
" return tests" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "9b0f5aee", | |
"metadata": {}, | |
"source": [ | |
"Now `run_unittest` discovers 4 tests, including our `doctest` defined earlier." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "9bad53dc", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"....\n", | |
"----------------------------------------------------------------------\n", | |
"Ran 4 tests in 0.004s\n", | |
"\n", | |
"OK\n" | |
] | |
} | |
], | |
"source": [ | |
" run_unittest()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "28c31eea", | |
"metadata": {}, | |
"source": [ | |
"## restart and run all, or it didn't happen\n", | |
"\n", | |
"Now you see how to simulate running formal tests against your interactive notebook code. The notebook is a means, not an end. Now it is time to copy your module code and new tests into your project!\n", | |
"\n", | |
"For posterity though, it is helpful that a notebook can restart and run all. Better yet, your notebook can restart and run all... with testing in the last cell! And, this specific document abides as the prior code cell is the last cell, and a test! We now have a little more confidence that this notebook could work in the future, or at least verify that it still works." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "27a2b156", | |
"metadata": {}, | |
"source": [ | |
"When you get to the last cell with no errors, it is time to celebrate a notebook well written." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3f6ab3e8", | |
"metadata": {}, | |
"source": [ | |
"## conclusion\n", | |
"\n", | |
"* we can simulate unit testing in the notebook.\n", | |
"* `doctest`s run tests in strings and docstrings.\n", | |
"* `unittest`s run tests on objects and may include `doctest`.\n", | |
"\n", | |
"Testing is a good practice. It helps formalize the scientific method when code is involved. Being able to simulate tests in the notebook, `doctest` and `unittest` help expedite the test writing process by taking advantage of the rich interactive features of notebooks. When we write tests we record ideas so that our future selves can thank us for our past practice." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4198ac34", | |
"metadata": {}, | |
"source": [ | |
"## post script: _running a formal testing suite_\n", | |
"\n", | |
"It seems helpful to to illustrate how near we are to formal tests. Jupyter notebooks simulate different document forms, and we can export this notebook as a Python script. We can formally run the exported script with `unittest`. These steps, for this specific document, are:\n", | |
"\n", | |
"\n", | |
"\n", | |
"1. Convert the notebook to a Python script using Jupyter's [`nbconvert`] tool.\n", | |
"\n", | |
" !jupyter nbconvert --to script 2021-testing-in-notebooks.ipynb\n", | |
" \n", | |
"2. Run the newly generated script against the [`unittest` command line interface] with any extra parameters you may want to set.\n", | |
"\n", | |
" !python -m unittest 2021-testing-in-notebooks.py -v\n", | |
" \n", | |
" \n", | |
"> Note: `!` is an `IPython` feature that executes system commands.\n", | |
"\n", | |
"[`nbconvert`]: https://nbconvert.readthedocs.io/en/latest/index.html\n", | |
"[`unittest` command line interface]: https://docs.python.org/3/library/unittest.html#command-line-interface" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.7" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment