bollwyvl · July 28, 2023 17:21
diff --git a/nbconvert-in-jupyterlite.ipynb b/nbconvert-in-jupyterlite.ipynb
 {"metadata":{"language_info":{"codemirror_mode":{"name":"python","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8"},"kernelspec":{"name":"python","display_name":"Python (Pyodide)","language":"python"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Running `nbconvert` inside JupyterLite\n\nA potential answer to [running `nbconvert` in the browser](https://github.com/jupyter/nbconvert/issues/1766).\n\n> This probably only works if dropped into the [JupyterLite PR](https://github.com/jupyterlite/jupyterlite/pull/756) [Demo Site](https://jupyterlite--756.org.readthedocs.build/en/756/_static/lab/index.html).","metadata":{}},{"cell_type":"markdown","source":"## Why isn't this easy?\n\nA number of challenges are present:\n- `nbconvert` is pretty complicated machinery\n- it meets a _lot_ of use cases, and has a fair number of tricky dependencies that don't even make sense to build for the browser\n- it uses some semi-deprecated tools like `data_files`, poorly supported by pyodide.","metadata":{}},{"cell_type":"markdown","source":"## `import nbconvert`\n\nTo even get here will take a while.","metadata":{}},{"cell_type":"markdown","source":"### The Good Modules\n\nMany of `nbconvert`'s dependencies are shipped directly by pyodide, which is nice.","metadata":{}},{"cell_type":"code","source":"import piplite\nawait piplite.install([\n    \"nbformat\", \n    \"markupsafe\", \n    \"defusedxml\", \n    \"jupyterlab_pygments\", \n    \"jsonschema\", \n    \"jinja2\", \"lxml\", \n    \"entrypoints\", \n    \"mistune<2\", \n    \"pandocfilters\"\n])","metadata":{"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Bad Modules\n`nbconvert` itself brings in too many things, but we can skip dependencies.","metadata":{}},{"cell_type":"code","source":"await piplite.install([\"nbconvert\"], deps=False)","metadata":{"tags":[],"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Ugly Patches\n\nWe'll need to do some icky stuff.","metadata":{}},{"cell_type":"code","source":"import sys, types, os\nfrom pathlib import Path","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### `nbclient`\n\nIt would be _awesome_ to have `nbclient` working, but it's just not feasible at present. ","metadata":{}},{"cell_type":"code","source":"noop = lambda *args, **kwargs: dict(args=args, kwargs=kwargs)\nnbclient = types.ModuleType(\"nbclient\")\nnbclient.NotebookClient = nbclient.execute = noop\nsys.modules[\"nbclient\"] = nbclient\nnbclient_exceptions = types.ModuleType(\"nbclient.exceptions\")\nnbclient_exceptions.CellExecutionError = noop\nsys.modules[\"nbclient.exceptions\"] = nbclient_exceptions","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### The Ugliest Patch\n\nAgain, it's not even worth trying to do execution at this time, so we just patch the source directly","metadata":{}},{"cell_type":"code","source":"preprocessors = Path(\"/lib/python3.10/site-packages/nbconvert/preprocessors/__init__.py\")\nprint(preprocessors.write_text(\n    preprocessors.read_text().replace('\\nfrom .execute', '\\n# from .execute')\n), \"bytes written\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Import\n\nWe should have a _mostly_ working copy of `nbconvert` now.","metadata":{}},{"cell_type":"code","source":"import nbconvert\nnbconvert.__version__","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Handling `data_files`","metadata":{}},{"cell_type":"markdown","source":"While it _imports_, `nbconvert` relies on  `data_files`. However, it looks like `micropip` (and therefore `piplite`) doesn't support. We'll need to populate one of the expected `jupyter_paths` with the templates from the `nbconvert` wheel.","metadata":{}},{"cell_type":"markdown","source":"### The Jupyter Path","metadata":{}},{"cell_type":"code","source":"from jupyter_core.paths import jupyter_path\njupyter_path()","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Since we \"own\" this whole computer, we'll use the shortest one, `/share/jupyter`","metadata":{}},{"cell_type":"markdown","source":"### Get the Templates\n\nAs we know these exist in the JupyterLite example site, we can extract them from our well-known location.","metadata":{}},{"cell_type":"code","source":"import pyodide.http, zipfile, tempfile, shutil, io\nwhl = await pyodide.http.pyfetch(\n    f\"\"\"{piplite.piplite._PIPLITE_URLS[0].split(\"all.json\")[0]}/\"\"\"\n    f\"\"\"nbconvert-{nbconvert.__version__}-py3-none-any.whl\"\"\"\n)\nwhl_bytes = await whl.bytes()\nprint(len(whl_bytes), \"bytes loaded\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"A `.whl` is just a `.zip` file, with a generally-well-known layout.","metadata":{}},{"cell_type":"markdown","source":"### Copy the Templates","metadata":{}},{"cell_type":"code","source":"dest = pathlib.Path(\"/share/jupyter/nbconvert/templates\")\nshutil.rmtree(dest, ignore_errors=True)\ndest.parent.mkdir(parents=True, exist_ok=True)\n\nwith tempfile.TemporaryDirectory() as td:\n    with zipfile.ZipFile(io.BytesIO(whl_bytes)) as zf:\n        zf.extractall(td)\n        shutil.copytree(f\"{td}/nbconvert-{nbconvert.__version__}.data/data/share/jupyter/nbconvert/templates\", dest)\n        print(len(sorted(dest.rglob(\"*\"))), \"templates copied\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Actually using `nbconvert`\n\nWe can't use the command line, the nbconvert docs show how we can use [`nbconvert` as a library](https://nbconvert.readthedocs.io/en/latest/nbconvert_library.html#Quick-overview).","metadata":{}},{"cell_type":"code","source":"import nbformat\nfrom traitlets.config import Config\nfrom nbconvert import HTMLExporter","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### Get the notebook\n\nIt could be any notebook, but we have this fine one here.","metadata":{}},{"cell_type":"code","source":"nb_name = \"nbconvert-in-jupyterlite\"","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"this_notebook = nbformat.reads(Path(f\"{nb_name}.ipynb\").read_text(), 4)\nthis_notebook.cells[0]","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### One last hack\n\nA few of `nbconvert`'s preprocessors do things we can't handle... for now, we at least know we can't execute.","metadata":{}},{"cell_type":"code","source":"safe_preprocessors = [\n    dpp for dpp in HTMLExporter.default_preprocessors.default()\n    if \"Execute\" not in dpp\n]","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### The Exporter","metadata":{}},{"cell_type":"code","source":"html_exporter = HTMLExporter(default_preprocessors=safe_preprocessors)","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### Get the HTML","metadata":{}},{"cell_type":"code","source":"(body, resources) = html_exporter.from_notebook_node(this_notebook)\nprint(len(body), \"bytes of HTML 🎉\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Use the HTML\n\nWith the HTML in hand we can do a bunch of things.","metadata":{}},{"cell_type":"markdown","source":"### ...As contents in JupyterLab\n\nFiles work _pretty_ well in Jupyterlite, and the bytes never end up in the output.","metadata":{}},{"cell_type":"code","source":"print(pathlib.Path(f\"{nb_name}.html\").write_text(body), \"bytes written\")\nMarkdown(f\"[Open `{nb_name}.html` in Lab](./{nb_name}.html)\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"### ...As a base64-encoded string\n\nWithout using JupyterLite contents, HTML can be hard to work with inside other HTML (much less HTML-in-JSON). It is often safest to work with it in the verbose, yet predictable, base64-encoded format.","metadata":{}},{"cell_type":"code","source":"from IPython.display import HTML, Markdown\nfrom html import escape\nfrom urllib.parse import quote\nfrom base64 import b64encode","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"b64_encoded = b64encode(body.encode(\"utf-8\")).decode(\"utf-8\")\nprint(len(b64_encoded), \"base64-encoded bytes\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### ...In an IFrame\n\nIFrames work pretty well with data uris.","metadata":{}},{"cell_type":"code","source":"HTML(f\"\"\"\n<iframe src=\"data:text/html;base64,{b64_encoded}\" width=\"100%\" style=\"min-height: 400px; height: 100%\"></iframe>\n\"\"\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### ...In a download link\n\nA download link is very handy, as you can specify a name instead of the full URL.","metadata":{}},{"cell_type":"code","source":"HTML(f\"\"\"\n<a href=\"data:text/html;base64,{b64_encoded}\" download=\"{nb_name}.html\">\nDownload <code>{nb_name}.html</code>\n</a>\n\"\"\")","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Future Work\n\n- explore packaging this technique for JupyterLite\n  - by creating a [custom JupyterLite addon](https://jupyterlite.readthedocs.io/en/latest/howto/extensions/cli-addons.html), it would be possible to fire up a kernel and service the expected JupyterLab UI components\n- do a better shim of `nbclient`\n  - this could use JupyterLite machinery to actually run kernels\n- further simplify the `nbconvert` API\n  - `nbclient` could be an optional, or at least not catastrophically-failing dependency","metadata":{}}]}