These are standards that we came up with at Osmo Systems to allow technicians to use Jupyter notebooks to run experiments, while maintaining quality standards to avoid bugs and promote readability and reusability.
Each experiment we did was saved in a shared Google Drive folder, which contained a README entry point and any supporting notebooks and other files.
- In general, see Code Style Manifesto
- Write code conforming to Pep8, e.g. imports, top-level constants
- Black code formatting is preferred. You can add a black formatting button to your notebook with this tool.
- Self-documentation: Someone else (or yourself in 3 months) should be able to open the notebook and know wtf is going on and why. Note: your README is the standard entry point for understanding the experiment at a high level (so that you don’t need jupyter to understand the purpose and basic results); it should refer to the notebook if necessary.
- Pin first-party dependencies: include a
!pip install
cell the top of your notebook that explicitly installs the right version of your Osmo libraries. This allows us to easily re-run your notebook in the future even if we’ve made breaking changes in our libraries.- For example, that cell may contain:
!pip uninstall --quiet --yes osmo-jupyter
!pip install --quiet git+ssh://[email protected]/osmosystems/osmo-jupyter.git@1e32a619bf091237d7fb37d19c90e1d38b3a6717
- For example, that cell may contain:
- Librarification: If something has been copied between notebooks a few times, or is growing into a significantly complicated piece of logic, please ask the software team to migrate it to a library.
- Standardized Paths / Import procedure
- Paths relating to external data should use relative paths. This means use “./” to indicate the current directory so that paths do not need to be modified across computers.
- Example:
node_data = pd.read_csv("./osmobot-report.csv")
- Paths relating to external data should use relative paths. This means use “./” to indicate the current directory so that paths do not need to be modified across computers.
- Well Labeled Figures :D
- Axes, Units, Title, Legend
- Don’t repeat the whole README in a cell, but do explain the purpose of the notebook, providing concise info when helpful.
- Delete stuff that you do not need in a notebook to improve clarity.
- Use “What did we learn from this graph?” markdown cells to describe how a graph is informative.
- If there are blocks of code that are logically part of the notebook flow, but are often skipped, put these blocks inside an if statement that is controlled by an ALL_CAPS_CONSTANT.
- Heuristic for what should be an ALL_CAPS_CONSTANT: if you want to use it inside a function without passing it in, it should be an ALL_CAPS_CONSTANT.
- If you have slow sections of your code, use progress bars to inform users of execution time.
- Use inline “display-style” or “assertion-style” “tests” to demonstrate what a function does. Don’t delete these tests, but ensure they are cleanly written and are useful at demonstrating a function’s purpose.
- When using files, explain in detail where they have come from.
There’s no great system for code reviews on notebooks. We have a simple system that has so far been fine.
When reviewing someone’s notebook, either through a review task that is assigned to you or otherwise, ALWAYS copy the notebook, rename the copy with “[your name] review copy” and add your review comments (eg. # [your name] note: why did you do this here?
) in the newly created notebook.
Additionally you can use NBDime and the nbdiff tool it provides to verify changes between notebooks that are expected to be somewhat similar.