Reading further, it seems this could be better resolved with a custom writer rather than the
--lua-filter
I was originally trying to use:
- https://pandoc.org/lua-filters.html#pandoc.scaffolding-fields
- https://pandoc.org/lua-filters.html#pandoc.scaffolding.Writer
An object to be used as a Writer function; the construct handles most of the boilerplate, expecting only render functions for all AST elements (table)
- https://pandoc.org/custom-writers.html
- https://pandoc.org/chunkedhtml-demo/3.3-general-writer-options.html
- https://pandoc.org/chunkedhtml-demo/15-custom-readers-and-writers.html
With an example from
djot-writer.lua
(Ref: 1, 2):Inlines.Emph = function(el) return concat{ "_", inlines(el.content), "_" } endBlocks.BulletList = function(el) local attr = render_attributes(el, true) local result = {attr, cr} for i=1,#el.content do result[#result + 1] = hang(blocks(el.content[i], blankline), 2, concat{"-",space}) end local sep = blankline if is_tight_list(el) then sep = cr end return concat(result, sep) endSee also, the following issue about improving the docs:
Originally posted by @0xdevalias in jgm/pandoc#10527 (comment)
See also:
- https://www.reddit.com/r/pandoc/comments/10yvrgg/getting_into_custom_writers/
- https://hackage.haskell.org/package/pandoc-3.6.1/docs/Text-Pandoc-Writers.html
- http://chulsky.com/pandoc/
-
Pandoc is a piece of software that converts documents from one format to another. It natively handles a huge number of formats, and has good documentation. But what if you have some custom format that Pandoc doesn't natively handle?
Pandoc is extensible, and you can write custom readers and writers in Lua. But here, I didn't find the same level of documentation, and getting started was very difficult.
I was able to get generous personal help from one of the core developers; I don't know if I could have figured it out on my own. My hope is that this introduction can help you get to the point where the documentation makes sense, and allows you to independently construct custom filters and writers for Pandoc.
-
Understanding the AST
-
Using A Writer Once we're ready, supposing our writer is saved as writer.lua, we can run the writer with
$ pandoc --from=latex --to=writer.lua file.tex
-
This is a PoC Lua filter script which allows you to fine-tune the Markdown output from HTML conversions using Pandoc.
- https://pandoc.org/lua-filters.html
- https://www.lua.org/pil/
-
Programming in Lua
- https://www.lua.org/pil/contents.html
-
Programming in Lua (first edition)
-
-
- https://www.lua.org/manual/
-
Reference manuals
- https://www.lua.org/manual/5.4/
-
Lua 5.4 Reference Manual
-
-
It offers configurable options to control bullet list markers, emphasis styles, etc.
Key features:
- Customizable bullet markers (
-
or*
) - Adjustable emphasis styles (
_
or*
) - Supports configuration via:
- Document metadata (YAML)
- Command line arguments (
--metadata
) - Metadata files (
--metadata-file
)
I mostly started writing this so I could force pandoc
to use _foo_
for emphasis rather than *foo*
.
Note that the bullet-marker
setting is somewhat complex, and I'm sure there is a simpler/better way to write it.. but I already spent way longer on this than I intended. It also has some bugs and edgecases currently, and for my original need it doesn't even matter since pandoc
defaults to using my preferred list marker -
anyway.
The following command will read HTML content (if present) from the clipboard on macOS, decode it, convert non-breaking spaces to normal spaces, and then use pandoc
to convert it to markdown:
osascript -e 'the clipboard as «class HTML»' \
| sed -e 's/^«data HTML//' -e 's/»$//' \
| xxd -r -p \
| sed 's/\xc2\xa0/ /g' \
| pandoc -f html -t gfm --wrap=none --lua-filter=pandoc-markdown-devalias.lua --metadata debug=true --metadata emphasis-marker='_' --metadata bullet-marker='-'
You can render the following example markdown via this website (or similar), then copy it to the clipboard:
- This is a basic list item
- This is basic but nested
- `This is a bit more complex`
- > So is this blockquote..\
> Which splits over multiple lines..\
> Just to make things harder...
- > This is a deeply nested blockquote\
> Which splits over multiple lines..\
> Just to make things harder again...
- ```javascript
Now we are getting esoteric
With multiline content..
In a code block!
With weird indents
```
_This is a test of emphasised text._
And another test.
> Oh wow, a blockquote!
- > A list item blockquote
AAA > Technically not a blockquote..
How.
Do.
These.
Lines.
Look?
Which will give you HTML something like this:
<meta charset='utf-8'><ul style="margin-top: 0px !important; margin-bottom: 16px; padding-left: 2em; color: rgb(31, 35, 40); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li>This is a basic list item<ul style="margin-top: 0px; margin-bottom: 0px; padding-left: 2em;"><li>This is basic but nested</li></ul></li><li style="margin-top: 0.25em;"><code style="font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, "Liberation Mono", monospace; font-size: 13.6px; padding: 0.2em 0.4em; margin: 0px; white-space: break-spaces; background-color: rgba(175, 184, 193, 0.2); border-radius: 6px;">This is a bit more complex</code></li><li style="margin-top: 0.25em;"><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222);"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">So is this blockquote..<br>Which splits over multiple lines..<br>Just to make things harder...</p></blockquote><ul style="margin-top: 0px; margin-bottom: 0px; padding-left: 2em;"><li><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222);"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">This is a deeply nested blockquote<br>Which splits over multiple lines..<br>Just to make things harder again...</p></blockquote></li></ul></li><li style="margin-top: 0.25em;"><pre style="font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, "Liberation Mono", monospace; font-size: 13.6px; margin-top: 0px; margin-bottom: 16px; overflow-wrap: normal; padding: 16px; overflow: auto; line-height: 1.45; color: rgb(31, 35, 40); background-color: rgb(246, 248, 250); border-radius: 6px;"><code class="language-javascript" style="font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, "Liberation Mono", monospace; font-size: 13.6px; padding: 0px; margin: 0px; white-space: pre; background: transparent; border-radius: 6px; word-break: normal; border: 0px; display: inline; overflow: visible; line-height: inherit; overflow-wrap: normal;">Now we are getting esoteric
With multiline content..
In a code block!
With weird indents
</code></pre></li></ul><p style="font-size: 16px; margin: 0px 1em 16px; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><em>This is a test of emphasised text.</em></p><p style="font-size: 16px; margin: 0px 1em 16px; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">And another test.</p><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">Oh wow, a blockquote!</p></blockquote><ul style="margin-top: 0px; margin-bottom: 16px; padding-left: 2em; color: rgb(31, 35, 40); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222);"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">A list item blockquote</p></blockquote></li></ul><p style="font-size: 16px; margin: 0px 1em 16px; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">AAA > Technically not a blockquote..</p><p style="font-size: 16px; margin-top: 0px; margin-right: 1em; margin-bottom: 0px !important; margin-left: 1em; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">How. Do. These. Lines. Look?</p>
And then after running it through pandoc
with this --lua-filter
script, you end up with something like this:
- This is a basic list item
- This is basic but nested
- `This is a bit more complex`
- > So is this blockquote..
> Which splits over multiple lines..
> Just to make things harder...
- > This is a deeply nested blockquote
> Which splits over multiple lines..
> Just to make things harder again...
- ```
Now we are getting esoteric
With multiline content..
In a code block!
With weird indents
```
_This is a test of emphasised text._
And another test.
> Oh wow, a blockquote!
- > A list item blockquote
AAA \> Technically not a blockquote..
How. Do. These. Lines. Look?
Fix Link
output:
AST:
[ Plain
[ Link
( ""
, []
, [ ( "rel" , "nofollow" )
, ( "style"
, "box-sizing: border-box; background-color: transparent; color: var(--fgColor-accent, var(--color-accent-fg)); text-decoration: underline; text-underline-offset: 0.2rem;"
)
]
)
[ Str "https://binary.ninja/free/" ]
( "https://binary.ninja/free/" , "" )
]
Rendered markdown with gfm
:
- <a href="https://binary.ninja/free/" rel="nofollow" style="box-sizing: border-box; background-color: transparent; color: var(--fgColor-accent, var(--color-accent-fg)); text-decoration: underline; text-underline-offset: 0.2rem;">https://binary.ninja/free/</a>
Block elements:
- Plain: Plain text, not a paragraph.
- Para: Paragraph.
- LineBlock: Multiple non-breaking lines.
- CodeBlock: Code block (literal) with attributes.
- RawBlock: Raw block.
- BlockQuote: Block quote (list of blocks).
- OrderedList: Ordered list (attributes and a list of items, each a list of blocks).
- BulletList: Bullet list (list of items, each a list of blocks).
- DefinitionList: Definition list. Each item is a pair consisting of a term (list of inlines) and one or more definitions (list of blocks).
- Header: Header - level (integer) and text (inlines).
- HorizontalRule: Horizontal rule.
- Table: Table, with attributes, caption, column alignments, table head, bodies, and foot.
- Figure: Figure, with attributes, caption, and content (list of blocks).
- Div: Generic block container with attributes.
Inline elements:
- Str: Text (string).
- Emph: Emphasized text (list of inlines).
- Underline: Underlined text (list of inlines).
- Strong: Strongly emphasized text (list of inlines).
- Strikeout: Strikeout text (list of inlines).
- Superscript: Superscripted text (list of inlines).
- Subscript: Subscripted text (list of inlines).
- SmallCaps: Small caps text (list of inlines).
- Quoted: Quoted text (list of inlines) with a quote type.
- Cite: Citation (list of inlines) with a list of citations.
- Code: Inline code (literal) with attributes and text.
- Space: Inter-word space.
- SoftBreak: Soft line break.
- LineBreak: Hard line break.
- Math: TeX math (literal) with a math type and text.
- RawInline: Raw inline with a format and text.
- Link: Hyperlink with attributes, alternative text (list of inlines), and a target.
- Image: Image with attributes, alternative text (list of inlines), and a target.
- Note: Footnote or endnote (list of blocks).
- Span: Generic inline container with attributes and a list of inlines.
- Crossposted
- Newer version/evolution of this in my dotfiles:
- jgm/pandoc#10529
-
Improve cross-linking of types in 'Pandoc Lua Filters' docs
-
- jgm/pandoc#10531
-
Improve
pandoc.scaffolding
section in 'Pandoc Lua Filters' docs (RE: Custom Writers)
-
- jgm/pandoc#10527
-
Allow markdown
Emph
/BulletList
characters to be customised (eg.*
vs_
, and-
vs*
) - jgm/pandoc#10479
-
Markdown writer: fully customizable list markers
- jgm/pandoc#10479 (comment)
-
For some other examples of similar requests...
-
- jgm/pandoc#1826
-
Markdown writer: Add
--bullet-list-marker
argument option
-
- jgm/pandoc#1786
-
Add
--prefer-fenced-code-blocks
option
-
- jgm/pandoc#5584
-
Config file for markdown output options?
-
-
-
- jgm/pandoc#8750
-
Restructure Lua and JSON filter docs
- jgm/pandoc#9106
-
Simplify and reorganize Lua filter introduction
-
-
- https://pandoc.org/lua-filters.html
- https://pandoc.org/lua-filters.html#pandoc.scaffolding.Writer
-
An object to be used as a
Writer
function; the construct handles most of the boilerplate, expecting only render functions for all AST elements (table)
-
- https://pandoc.org/lua-filters.html#module-pandoc.template
-
Handle pandoc templates.
- https://pandoc.org/chunkedhtml-demo/6-templates.html
- https://github.com/jgm/pandoc-templates
-
- https://pandoc.org/lua-filters.html#pandoc.structure.table_of_contents
-
table_of_contents
Generates a table of contents for the given object.
-
- https://pandoc.org/lua-filters.html#pandoc.scaffolding.Writer
- https://pandoc.org/MANUAL.html#option--data-dir
-
--data-dir=DIRECTORY
Specify the user data directory to search for pandoc data files. If this option is not specified, the default user data directory will be used. On *nix and macOS systems this will be the pandoc subdirectory of the XDG data directory (by default,$HOME/.local/share
, overridable by setting theXDG_DATA_HOME
environment variable). If that directory does not exist and$HOME/.pandoc
exists, it will be used (for backwards compatibility). On Windows the default user data directory is%APPDATA%\pandoc
. You can find the default user data directory on your system by looking at the output ofpandoc --version
. Data files placed in this directory (for example,reference.odt
,reference.docx
,epub.css
,templates
) will override pandoc’s normal defaults. (Note that the user data directory is not created by pandoc, so you will need to create it yourself if you want to make use of it.)
-
- https://pandoc.org/CONTRIBUTING.html
- https://pandoc.org/CONTRIBUTING.html#adding-a-new-command-line-option
- https://pandoc.org/CONTRIBUTING.html#lua-filters
- https://github.com/pandoc/lua-filters
-
A collection of Lua filters for pandoc.
-
Warning: This repository is in the process of being retired. Please see the next section for details.
- https://github.com/pandoc/lua-filters#new-filters
-
New filters We want the ecosystem to be distributed, but also try to make it easy to discover new software. That's why we ask filter authors to add the pandoc and pandoc-filter to the GitHub repositories, as enables others to explore filters through GitHub's interface.
Additionally, please add a link to your filter to the pandoc-ext/info repository.
-
-
- https://github.com/orgs/pandoc-ext/repositories?type=all
- https://github.com/pandoc-ext/info
-
Pandoc Extensions General info on pandoc extensions.
-
- https://github.com/pandoc-ext/logging
-
Pandoc lua logging This library provides pandoc-aware functions for dumping and logging lua objects. It can be used standalone but is primarily intended for using within pandoc lua filters.
-
- https://github.com/pandoc-ext/info
- https://github.com/topics/pandoc-filter
- https://github.com/averms/pandoc-filters
-
A small, useful collection of pandoc filters
-
- https://github.com/tarleb/lua-filter-template
-
All the tools to publish a pandoc Lua filter quickly and easily; work in progress.
-
- https://github.com/pandoc/lua-filters