Pandoc was essential for publishing my book “JavaScript for impatient programmers”. The book exists in several versions:
- Printable PDF (for a print-on-demand book on Amazon)
- Screen PDF
- Multi-page HTML
- EPUB
- MOBI
The homepage of “JavaScript for impatient programmers” contains previews of all artifacts.
In this document, I describe some of the challenges I’ve encountered while working on the book.
HTML output is missing several important features. All of them are supported when using LaTeX via Pandoc:
- Top-level parts.
- Issue: jgm/pandoc#6411
- Chapter TOCs
- Related discussion: https://groups.google.com/forum/#!topic/pandoc-discuss/KEfzxqqueBU
- Index generation
- Issue: jgm/pandoc#6415
- My Lua filter for cross-format indices: https://gist.github.com/rauschma/bfacbe6f2e8461b4a62c0cc1a288188e
- Frontmatter (unnumbered chapters without a part) and appendices
I wrote Lua filters to work around these limitations (excluding frontmatter), but it wasn’t easy.
For HTML, I needed Pandoc to produce multiple files (kind of like the internals of the EPUBs that it produces).
The easiest workaround was to generate a single long HTML file and split it up, while updating cross-file links so that they also include filenames.
Related:
-
On one hand, we need to be in the same directory as the content, so that
\includepdf{}
works with relative paths.- Additional important benefit: That command can’t handle spaces in paths (which are common in absolute paths on macOS). That’s a bug that’s new in the latest version of XeLaTeX: https://github.com/ho-tex/oberdiek/issues/31
- Alas, Pandoc’s intermediate LaTeX output also has to sit next to the content. That’s a weakness of LaTeX, not of Pandoc.
-
On the other hand,
--extract-media
path assumes we are inside the output directory:pandoc --standalone -o ../out/chapter.html --extract-media=../out/img chapter.md
- Input:
![](img/diagram.svg)
- Actual output:
<img src="../out/img/08830082d8bd4c323da4ec4f51fb2a20b2dcaae7.svg" />
- Desired output:
<img src="img/08830082d8bd4c323da4ec4f51fb2a20b2dcaae7.svg" />
proj/
content/
chapter.md
img/
diagram.svg
out
chapter.html
img/
08830082d8bd4c323da4ec4f51fb2a20b2dcaae7.svg
- Choose a long unique name for the extracted media directory.
- Search-and-replace in the produced HTML output and copy the extracted directory into the output location.
Introduce a different mode for --extract-media
where paths to extracted files are relative to the output file. This different mode could be switched on via:
- An option that otherwise does the same as
--extract-media
, but computes paths differently:--extract-media-relative-to-output
- A separate option for specifying how
--extract-media
computes its paths:--extract-media-mode=relative-to-input
--extract-media-mode=relative-to-output
If other options work similarly to --extract-media
, it may make sense to introduce an option that works for all of them (instead of just for --extract-media
).
Related:
Wishes:
- At the moment, filters visit inlines in a separate pass. (This is a known problem and being worked on.)
- Should Pandoc ever support cross-format numbering of headings, filters would benefit from having access to the numbers of headings.
I’ve found Lua difficult to work with (tables and output are frustratingly limited, etc.). I originally wanted to publish my Lua filters, but they don’t feel robust enough for me to do so. The solution will be to eventually rewrite the filters in either Haskell, Rust or TypeScript. Then I can publish them.
-
The filter pandoc-crossref is important for supporting LaTeX’s floating images and tables for all output formats. It allows you to refer to them elegantly. It’d be great if this functionality could be built into Pandoc.
-
For images, I’m making a distinction:
- Bitmap graphics (same across all file formats):
.jpg
,.png
- Vector graphics (format-specific): no filename extension. The filename extension is then specified via
--default-image-extension
:- PDF:
.pdf
- EPUB, HTML:
.svg
- MOBI (via intermediate EPUB):
.jpg
Minor inconvenience: When previewing the Markdown in an editor, you don’t see the vector graphics. I’m not sure how to best fix this. Maybe with a mapping of image extensions:--image-extension-replace=svg/pdf
(i.e., use.svg
in Markdown input, but.pdf
in PDF output).
- PDF:
- Bitmap graphics (same across all file formats):
-
A filter that converts links to page numbers (use case: print PDF):
- Input:
This phenomenon is called [_hoisting_](#hoisting).
- Output: This phenomenon is called hoisting.
- Print (no link, page number via LaTeX): This phenomenon is called hoisting (page 392).
- Input:
-
References that mention the section number and section title:
- Input: For more information, see
[$full](#section-on-unicode)
. - Output: For more information, see §12.7.1 “JavaScript and Unicode”.
- Input: For more information, see
-
Inserting breaks into inline code (to fix overflow problems in LaTeX):
`Desc•.[[Con•fig•urable]]`{.break}
-
Linking to inline IDs doesn’t work in LaTeX. Workaround supported by filter:
[Hoisting]{#hoisting .texlabel} is an important term in this context.
UPDATE: fixed in master
-
Information boxes (“tip”, “warning”, etc.). Examples: https://exploringjs.com/impatient-js/ch_faq-book.html#notations-and-conventions
In general, I loved working with Pandoc. Especially its filters make it a flexible and powerful tool. It’s impressive how well they work.
The following features helped with creating the print PDF:
- Black & white syntax highlighting
- The option to convert links into footnotes