Last active
July 28, 2023 17:21
-
-
Save bollwyvl/6b3cb4c46b1764c6d9ae1e5831f86d7a to your computer and use it in GitHub Desktop.
nbconvert in jupyterlite
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"metadata":{"language_info":{"codemirror_mode":{"name":"python","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8"},"kernelspec":{"name":"python","display_name":"Python (Pyodide)","language":"python"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Running `nbconvert` inside JupyterLite\n\nA potential answer to [running `nbconvert` in the browser](https://github.com/jupyter/nbconvert/issues/1766).\n\n> This probably only works if dropped into the [JupyterLite PR](https://github.com/jupyterlite/jupyterlite/pull/756) [Demo Site](https://jupyterlite--756.org.readthedocs.build/en/756/_static/lab/index.html).","metadata":{}},{"cell_type":"markdown","source":"## Why isn't this easy?\n\nA number of challenges are present:\n- `nbconvert` is pretty complicated machinery\n- it meets a _lot_ of use cases, and has a fair number of tricky dependencies that don't even make sense to build for the browser\n- it uses some semi-deprecated tools like `data_files`, poorly supported by pyodide.","metadata":{}},{"cell_type":"markdown","source":"## `import nbconvert`\n\nTo even get here will take a while.","metadata":{}},{"cell_type":"markdown","source":"### The Good Modules\n\nMany of `nbconvert`'s dependencies are shipped directly by pyodide, which is nice.","metadata":{}},{"cell_type":"code","source":"import piplite\nawait piplite.install([\n \"nbformat\", \n \"markupsafe\", \n \"defusedxml\", \n \"jupyterlab_pygments\", \n \"jsonschema\", \n \"jinja2\", \"lxml\", \n \"entrypoints\", \n \"mistune<2\", \n \"pandocfilters\"\n])","metadata":{"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Bad Modules\n`nbconvert` itself brings in too many things, but we can skip dependencies.","metadata":{}},{"cell_type":"code","source":"await piplite.install([\"nbconvert\"], deps=False)","metadata":{"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Ugly Patches\n\nWe'll need to do some icky stuff.","metadata":{}},{"cell_type":"code","source":"import sys, types, os\nfrom pathlib import Path","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### `nbclient`\n\nIt would be _awesome_ to have `nbclient` working, but it's just not feasible at present. ","metadata":{}},{"cell_type":"code","source":"noop = lambda *args, **kwargs: dict(args=args, kwargs=kwargs)\nnbclient = types.ModuleType(\"nbclient\")\nnbclient.NotebookClient = nbclient.execute = noop\nsys.modules[\"nbclient\"] = nbclient\nnbclient_exceptions = types.ModuleType(\"nbclient.exceptions\")\nnbclient_exceptions.CellExecutionError = noop\nsys.modules[\"nbclient.exceptions\"] = nbclient_exceptions","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### The Ugliest Patch\n\nAgain, it's not even worth trying to do execution at this time, so we just patch the source directly","metadata":{}},{"cell_type":"code","source":"preprocessors = Path(\"/lib/python3.10/site-packages/nbconvert/preprocessors/__init__.py\")\nprint(preprocessors.write_text(\n preprocessors.read_text().replace('\\nfrom .execute', '\\n# from .execute')\n), \"bytes written\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Import\n\nWe should have a _mostly_ working copy of `nbconvert` now.","metadata":{}},{"cell_type":"code","source":"import nbconvert\nnbconvert.__version__","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Handling `data_files`","metadata":{}},{"cell_type":"markdown","source":"While it _imports_, `nbconvert` relies on `data_files`. However, it looks like `micropip` (and therefore `piplite`) doesn't support. We'll need to populate one of the expected `jupyter_paths` with the templates from the `nbconvert` wheel.","metadata":{}},{"cell_type":"markdown","source":"### The Jupyter Path","metadata":{}},{"cell_type":"code","source":"from jupyter_core.paths import jupyter_path\njupyter_path()","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Since we \"own\" this whole computer, we'll use the shortest one, `/share/jupyter`","metadata":{}},{"cell_type":"markdown","source":"### Get the Templates\n\nAs we know these exist in the JupyterLite example site, we can extract them from our well-known location.","metadata":{}},{"cell_type":"code","source":"import pyodide.http, zipfile, tempfile, shutil, io\nwhl = await pyodide.http.pyfetch(\n f\"\"\"{piplite.piplite._PIPLITE_URLS[0].split(\"all.json\")[0]}/\"\"\"\n f\"\"\"nbconvert-{nbconvert.__version__}-py3-none-any.whl\"\"\"\n)\nwhl_bytes = await whl.bytes()\nprint(len(whl_bytes), \"bytes loaded\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"A `.whl` is just a `.zip` file, with a generally-well-known layout.","metadata":{}},{"cell_type":"markdown","source":"### Copy the Templates","metadata":{}},{"cell_type":"code","source":"dest = pathlib.Path(\"/share/jupyter/nbconvert/templates\")\nshutil.rmtree(dest, ignore_errors=True)\ndest.parent.mkdir(parents=True, exist_ok=True)\n\nwith tempfile.TemporaryDirectory() as td:\n with zipfile.ZipFile(io.BytesIO(whl_bytes)) as zf:\n zf.extractall(td)\n shutil.copytree(f\"{td}/nbconvert-{nbconvert.__version__}.data/data/share/jupyter/nbconvert/templates\", dest)\n print(len(sorted(dest.rglob(\"*\"))), \"templates copied\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Actually using `nbconvert`\n\nWe can't use the command line, the nbconvert docs show how we can use [`nbconvert` as a library](https://nbconvert.readthedocs.io/en/latest/nbconvert_library.html#Quick-overview).","metadata":{}},{"cell_type":"code","source":"import nbformat\nfrom traitlets.config import Config\nfrom nbconvert import HTMLExporter","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### Get the notebook\n\nIt could be any notebook, but we have this fine one here.","metadata":{}},{"cell_type":"code","source":"nb_name = \"nbconvert-in-jupyterlite\"","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"this_notebook = nbformat.reads(Path(f\"{nb_name}.ipynb\").read_text(), 4)\nthis_notebook.cells[0]","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### One last hack\n\nA few of `nbconvert`'s preprocessors do things we can't handle... for now, we at least know we can't execute.","metadata":{}},{"cell_type":"code","source":"safe_preprocessors = [\n dpp for dpp in HTMLExporter.default_preprocessors.default()\n if \"Execute\" not in dpp\n]","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Exporter","metadata":{}},{"cell_type":"code","source":"html_exporter = HTMLExporter(default_preprocessors=safe_preprocessors)","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### Get the HTML","metadata":{}},{"cell_type":"code","source":"(body, resources) = html_exporter.from_notebook_node(this_notebook)\nprint(len(body), \"bytes of HTML 🎉\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Use the HTML\n\nWith the HTML in hand we can do a bunch of things.","metadata":{}},{"cell_type":"markdown","source":"### ...As contents in JupyterLab\n\nFiles work _pretty_ well in Jupyterlite, and the bytes never end up in the output.","metadata":{}},{"cell_type":"code","source":"print(pathlib.Path(f\"{nb_name}.html\").write_text(body), \"bytes written\")\nMarkdown(f\"[Open `{nb_name}.html` in Lab](./{nb_name}.html)\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### ...As a base64-encoded string\n\nWithout using JupyterLite contents, HTML can be hard to work with inside other HTML (much less HTML-in-JSON). It is often safest to work with it in the verbose, yet predictable, base64-encoded format.","metadata":{}},{"cell_type":"code","source":"from IPython.display import HTML, Markdown\nfrom html import escape\nfrom urllib.parse import quote\nfrom base64 import b64encode","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"b64_encoded = b64encode(body.encode(\"utf-8\")).decode(\"utf-8\")\nprint(len(b64_encoded), \"base64-encoded bytes\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### ...In an IFrame\n\nIFrames work pretty well with data uris.","metadata":{}},{"cell_type":"code","source":"HTML(f\"\"\"\n<iframe src=\"data:text/html;base64,{b64_encoded}\" width=\"100%\" style=\"min-height: 400px; height: 100%\"></iframe>\n\"\"\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### ...In a download link\n\nA download link is very handy, as you can specify a name instead of the full URL.","metadata":{}},{"cell_type":"code","source":"HTML(f\"\"\"\n<a href=\"data:text/html;base64,{b64_encoded}\" download=\"{nb_name}.html\">\nDownload <code>{nb_name}.html</code>\n</a>\n\"\"\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Future Work\n\n- explore packaging this technique for JupyterLite\n - by creating a [custom JupyterLite addon](https://jupyterlite.readthedocs.io/en/latest/howto/extensions/cli-addons.html), it would be possible to fire up a kernel and service the expected JupyterLab UI components\n- do a better shim of `nbclient`\n - this could use JupyterLite machinery to actually run kernels\n- further simplify the `nbconvert` API\n - `nbclient` could be an optional, or at least not catastrophically-failing dependency","metadata":{}}]} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment