Skip to content

Instantly share code, notes, and snippets.

@0xdevalias
Last active January 13, 2025 23:32
Show Gist options
  • Save 0xdevalias/794d1aa03c357425c4c9583d9edc0303 to your computer and use it in GitHub Desktop.
Save 0xdevalias/794d1aa03c357425c4c9583d9edc0303 to your computer and use it in GitHub Desktop.
PoC pandoc --lua-filter for customizing markdown output

PoC pandoc --lua-filter for customizing markdown output

Table of Contents

Custom Writer

Reading further, it seems this could be better resolved with a custom writer rather than the --lua-filter I was originally trying to use:

With an example from djot-writer.lua(Ref: 1, 2):

Inlines.Emph = function(el)
  return concat{ "_", inlines(el.content), "_" }
end
Blocks.BulletList = function(el)
  local attr = render_attributes(el, true)
  local result = {attr, cr}
  for i=1,#el.content do
    result[#result + 1] = hang(blocks(el.content[i], blankline), 2, concat{"-",space})
  end
  local sep = blankline
  if is_tight_list(el) then
    sep = cr
  end
  return concat(result, sep)
end

See also, the following issue about improving the docs:

Originally posted by @0xdevalias in jgm/pandoc#10527 (comment)

See also:

  • https://www.reddit.com/r/pandoc/comments/10yvrgg/getting_into_custom_writers/
  • https://hackage.haskell.org/package/pandoc-3.6.1/docs/Text-Pandoc-Writers.html
  • http://chulsky.com/pandoc/
    • Pandoc is a piece of software that converts documents from one format to another. It natively handles a huge number of formats, and has good documentation. But what if you have some custom format that Pandoc doesn't natively handle?

      Pandoc is extensible, and you can write custom readers and writers in Lua. But here, I didn't find the same level of documentation, and getting started was very difficult.

      I was able to get generous personal help from one of the core developers; I don't know if I could have figured it out on my own. My hope is that this introduction can help you get to the point where the documentation makes sense, and allows you to independently construct custom filters and writers for Pandoc.

    • Understanding the AST

    • Using A Writer Once we're ready, supposing our writer is saved as writer.lua, we can run the writer with

      $ pandoc --from=latex --to=writer.lua file.tex

Original PoC --lua-filter script

This is a PoC Lua filter script which allows you to fine-tune the Markdown output from HTML conversions using Pandoc.

It offers configurable options to control bullet list markers, emphasis styles, etc.

Key features:

  • Customizable bullet markers (- or *)
  • Adjustable emphasis styles (_ or *)
  • Supports configuration via:
    • Document metadata (YAML)
    • Command line arguments (--metadata)
    • Metadata files (--metadata-file)

I mostly started writing this so I could force pandoc to use _foo_ for emphasis rather than *foo*.

Note that the bullet-marker setting is somewhat complex, and I'm sure there is a simpler/better way to write it.. but I already spent way longer on this than I intended. It also has some bugs and edgecases currently, and for my original need it doesn't even matter since pandoc defaults to using my preferred list marker - anyway.

Usage

The following command will read HTML content (if present) from the clipboard on macOS, decode it, convert non-breaking spaces to normal spaces, and then use pandoc to convert it to markdown:

osascript -e 'the clipboard as «class HTML»' \
| sed -e 's/^«data HTML//' -e 's/»$//' \
| xxd -r -p \
| sed 's/\xc2\xa0/ /g' \
| pandoc -f html -t gfm --wrap=none --lua-filter=pandoc-markdown-devalias.lua --metadata debug=true --metadata emphasis-marker='_' --metadata bullet-marker='-'

You can render the following example markdown via this website (or similar), then copy it to the clipboard:

- This is a basic list item
  - This is basic but nested
- `This is a bit more complex` 
- > So is this blockquote..\
  > Which splits over multiple lines..\
  > Just to make things harder...
  - > This is a deeply nested blockquote\
    > Which splits over multiple lines..\
    > Just to make things harder again...
- ```javascript
  Now we are getting esoteric
  With multiline content..
  In a code block!
    With weird indents
  ```

_This is a test of emphasised text._

And another test.

> Oh wow, a blockquote!

- > A list item blockquote

AAA > Technically not a blockquote..

How.
Do.
These.
Lines.
Look?

Which will give you HTML something like this:

<meta charset='utf-8'><ul style="margin-top: 0px !important; margin-bottom: 16px; padding-left: 2em; color: rgb(31, 35, 40); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li>This is a basic list item<ul style="margin-top: 0px; margin-bottom: 0px; padding-left: 2em;"><li>This is basic but nested</li></ul></li><li style="margin-top: 0.25em;"><code style="font-family: ui-monospace, SFMono-Regular, &quot;SF Mono&quot;, Menlo, Consolas, &quot;Liberation Mono&quot;, monospace; font-size: 13.6px; padding: 0.2em 0.4em; margin: 0px; white-space: break-spaces; background-color: rgba(175, 184, 193, 0.2); border-radius: 6px;">This is a bit more complex</code></li><li style="margin-top: 0.25em;"><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222);"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">So is this blockquote..<br>Which splits over multiple lines..<br>Just to make things harder...</p></blockquote><ul style="margin-top: 0px; margin-bottom: 0px; padding-left: 2em;"><li><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222);"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">This is a deeply nested blockquote<br>Which splits over multiple lines..<br>Just to make things harder again...</p></blockquote></li></ul></li><li style="margin-top: 0.25em;"><pre style="font-family: ui-monospace, SFMono-Regular, &quot;SF Mono&quot;, Menlo, Consolas, &quot;Liberation Mono&quot;, monospace; font-size: 13.6px; margin-top: 0px; margin-bottom: 16px; overflow-wrap: normal; padding: 16px; overflow: auto; line-height: 1.45; color: rgb(31, 35, 40); background-color: rgb(246, 248, 250); border-radius: 6px;"><code class="language-javascript" style="font-family: ui-monospace, SFMono-Regular, &quot;SF Mono&quot;, Menlo, Consolas, &quot;Liberation Mono&quot;, monospace; font-size: 13.6px; padding: 0px; margin: 0px; white-space: pre; background: transparent; border-radius: 6px; word-break: normal; border: 0px; display: inline; overflow: visible; line-height: inherit; overflow-wrap: normal;">Now we are getting esoteric
With multiline content..
In a code block!
  With weird indents
</code></pre></li></ul><p style="font-size: 16px; margin: 0px 1em 16px; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><em>This is a test of emphasised text.</em></p><p style="font-size: 16px; margin: 0px 1em 16px; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">And another test.</p><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">Oh wow, a blockquote!</p></blockquote><ul style="margin-top: 0px; margin-bottom: 16px; padding-left: 2em; color: rgb(31, 35, 40); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li><blockquote style="margin: 0px 0px 16px; padding: 0px 1em; color: rgb(99, 108, 118); border-left: 0.25em solid rgb(208, 215, 222);"><p style="font-size: 1em; margin: 0px 1em; padding: 0px; font-weight: normal;">A list item blockquote</p></blockquote></li></ul><p style="font-size: 16px; margin: 0px 1em 16px; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">AAA &gt; Technically not a blockquote..</p><p style="font-size: 16px; margin-top: 0px; margin-right: 1em; margin-bottom: 0px !important; margin-left: 1em; padding: 0px; font-weight: 400; color: rgb(31, 35, 40); font-family: -apple-system, &quot;system-ui&quot;, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">How. Do. These. Lines. Look?</p>

And then after running it through pandoc with this --lua-filter script, you end up with something like this:

- This is a basic list item  
  - This is basic but nested  
- `This is a bit more complex`  
- > So is this blockquote..  
  > Which splits over multiple lines..  
  > Just to make things harder...  
  - > This is a deeply nested blockquote  
  > Which splits over multiple lines..  
  > Just to make things harder again...  
- ```  
  Now we are getting esoteric
  With multiline content..
  In a code block!
    With weird indents  
  ```

_This is a test of emphasised text._

And another test.

> Oh wow, a blockquote!

- > A list item blockquote

AAA \> Technically not a blockquote..

How. Do. These. Lines. Look?

Future Enhancements

Fix Link output:

AST:

[ Plain
                [ Link
                    ( ""
                    , []
                    , [ ( "rel" , "nofollow" )
                      , ( "style"
                        , "box-sizing: border-box; background-color: transparent; color: var(--fgColor-accent, var(--color-accent-fg)); text-decoration: underline; text-underline-offset: 0.2rem;"
                        )
                      ]
                    )
                    [ Str "https://binary.ninja/free/" ]
                    ( "https://binary.ninja/free/" , "" )
                ]

Rendered markdown with gfm:

  - <a href="https://binary.ninja/free/" rel="nofollow" style="box-sizing: border-box; background-color: transparent; color: var(--fgColor-accent, var(--color-accent-fg)); text-decoration: underline; text-underline-offset: 0.2rem;">https://binary.ninja/free/</a>

Pandoc AST Overview

Block

Block elements:

  • Plain: Plain text, not a paragraph.
  • Para: Paragraph.
  • LineBlock: Multiple non-breaking lines.
  • CodeBlock: Code block (literal) with attributes.
  • RawBlock: Raw block.
  • BlockQuote: Block quote (list of blocks).
  • OrderedList: Ordered list (attributes and a list of items, each a list of blocks).
  • BulletList: Bullet list (list of items, each a list of blocks).
  • DefinitionList: Definition list. Each item is a pair consisting of a term (list of inlines) and one or more definitions (list of blocks).
  • Header: Header - level (integer) and text (inlines).
  • HorizontalRule: Horizontal rule.
  • Table: Table, with attributes, caption, column alignments, table head, bodies, and foot.
  • Figure: Figure, with attributes, caption, and content (list of blocks).
  • Div: Generic block container with attributes.

Inline

Inline elements:

  • Str: Text (string).
  • Emph: Emphasized text (list of inlines).
  • Underline: Underlined text (list of inlines).
  • Strong: Strongly emphasized text (list of inlines).
  • Strikeout: Strikeout text (list of inlines).
  • Superscript: Superscripted text (list of inlines).
  • Subscript: Subscripted text (list of inlines).
  • SmallCaps: Small caps text (list of inlines).
  • Quoted: Quoted text (list of inlines) with a quote type.
  • Cite: Citation (list of inlines) with a list of citations.
  • Code: Inline code (literal) with attributes and text.
  • Space: Inter-word space.
  • SoftBreak: Soft line break.
  • LineBreak: Hard line break.
  • Math: TeX math (literal) with a math type and text.
  • RawInline: Raw inline with a format and text.
  • Link: Hyperlink with attributes, alternative text (list of inlines), and a target.
  • Image: Image with attributes, alternative text (list of inlines), and a target.
  • Note: Footnote or endnote (list of blocks).
  • Span: Generic inline container with attributes and a list of inlines.

See Also

-- Filter to customize markdown output
--
-- Configurable via metadata options:
-- bullet-marker: character to use for bullet lists ('-' or '*')
-- emphasis-marker: character to use for emphasis ('_' or '*')
--
-- Configuration can be provided via:
-- YAML metadata in the document
-- Command line: --metadata bullet-marker='-' --metadata emphasis-marker='_'
-- Metadata file: --metadata-file config.yaml
--
-- Example:
-- pbpaste | pandoc -f html -t gfm --lua-filter pandoc-markdown-devalias.lua --metadata bullet-marker='-' --metadata emphasis-marker='_'
--
-- Ref:
-- https://gist.github.com/0xdevalias/794d1aa03c357425c4c9583d9edc0303#poc-pandoc---lua-filter-for-customizing-markdown-output
-- https://pandoc.org/lua-filters.html
-- https://pandoc.org/lua-filters.html#type-writeroptions
-- Default values if not specified in metadata
local debug_mode = false
local bullet_marker = '-'
local bullet_marker_indent = string.rep(" ", #bullet_marker)
local emphasis_marker = '_'
-- Helper function for debug logging
-- TODO: Potentially improve with pandoc.log: (https://pandoc.org/lua-filters.html#module-pandoc.log) or https://github.com/pandoc-ext/logging
local function debug(msg, ...)
if debug_mode then
io.stderr:write(string.format("[DEBUG] " .. msg .. "\n", ...))
end
end
-- Helper function to safely extract values from Pandoc's metadata
-- Handles boolean values directly and converts other types appropriately
--
-- @param meta: The metadata table from Pandoc
-- @param key: The key to look up in the metadata
-- @param default: Value to return if key is not found
-- @return: Value from metadata (preserving boolean type) or default
function get_metadata_value(meta, key, default)
local value = meta[key]
if value == nil then
return default
end
if type(value) == "boolean" then
return value
end
if type(value) == "table" and value.text ~= nil then
return value.text
end
return tostring(value)
end
-- Processes document metadata to configure the filter
-- Called by Pandoc at the start of document processing
--
-- @param meta: The document's metadata table
-- @return: The (possibly modified) metadata table
function Meta(meta)
-- Read configuration from metadata, falling back to defaults if not specified
debug_mode = get_metadata_value(meta, 'debug', debug_mode) == true
bullet_marker = get_metadata_value(meta, 'bullet-marker', bullet_marker)
bullet_marker_indent = string.rep(" ", #bullet_marker)
emphasis_marker = get_metadata_value(meta, 'emphasis-marker', emphasis_marker)
debug("Initialized with FORMAT: %s", FORMAT)
debug("bullet-marker: %s", bullet_marker)
debug("bullet-marker-indent: '%s' (without quotes)", bullet_marker_indent)
debug("emphasis-marker: %s", emphasis_marker)
debug("debug-mode: %s", debug_mode)
return meta
end
-- Processes emphasis elements (italic text) by converting them to raw format
-- Ensures consistent emphasis marker usage throughout the document
--
-- @param el: The emphasis element (pandoc AST element)
-- @return: A RawInline element with explicitly formatted emphasis markers
--
-- Note: Uses the global FORMAT and emphasis_marker variables
function Emph(el)
-- Convert the inline content to a plain string
local text = pandoc.utils.stringify(el)
-- Build the new string, wrapping content with the custom marker
local wrapped = emphasis_marker .. text .. emphasis_marker
-- Return as raw Markdown (or another format if you prefer)
return pandoc.RawInline("markdown", wrapped)
end
-- TODO: This feels overly complex still.. surely we can refactor it to be better/simpler somehow..?
-- https://pandoc.org/lua-filters.html#pandoc.walk_block
-- https://pandoc.org/lua-filters.html#pandoc.walk_inline
-- https://pandoc.org/lua-filters.html#pandoc.write
-- https://pandoc.org/lua-filters.html#type-writeroptions
function BulletList(bulletList)
-- https://pandoc.org/lua-filters.html#type-blocks:walk
-- https://pandoc.org/lua-filters.html#traversal-order
-- https://pandoc.org/lua-filters.html#pandoc.utils.blocks_to_inlines
debug("Start of BulletList")
-- local bullet = pandoc.List:new{
-- pandoc.Str(bullet_marker),
-- pandoc.Space()
-- }
-- local nestedIndent = pandoc.List:new{
-- pandoc.Space(),
-- pandoc.Space()
-- }
-- return bulletList:walk {
-- Block = function (block)
-- debug("block: %s %s", block.t, block)
-- if block.t == "BulletList" then
-- return block
-- elseif block.content then
-- block.content = pandoc.List:new{ pandoc.Str(bullet_marker), pandoc.Space() } .. block.content
-- return block
-- else
-- return block
-- -- else
-- -- block.content = pandoc.List:new{ pandoc.Str(bullet_marker), pandoc.Space() } .. block.content
-- -- return block
-- end
-- end,
-- }
local transformedBlocks = pandoc.List()
-- Check the last item and append content if it's Plain or Para
-- TODO: This may be useful here: https://pandoc.org/lua-filters.html#pandoc.utils.blocks_to_inlines
local function updateLastBlock(block)
local lastItem = transformedBlocks[#transformedBlocks]
if lastItem and (lastItem.t == "Plain" or lastItem.t == "Para") then
lastItem.content = lastItem.content .. {pandoc.LineBreak()} .. block.content
else
-- Either the last item is not Plain/Para, or the list is empty
transformedBlocks:insert(block)
end
end
-- local transformedBlocks = bulletList.content:map(function(item, i)
for i, item in ipairs(bulletList.content) do
-- TODO: pandoc.type might be better to use here: https://pandoc.org/lua-filters.html#pandoc.utils.type
debug(" Item %d: %s %s", i, type(item), item)
for j, block in ipairs(item) do
-- return item:map(function(block, j)
-- TODO: pandoc.type might be better to use here: https://pandoc.org/lua-filters.html#pandoc.utils.type
debug(" Item %d, Block %d of %d: %s %s: %s", i, j, #item, type(block), block.t, block)
if (block.t == "Plain" or block.t == "Para") then
debug(" Handling '%s' block: %s, [1]: %s %s", block.t, block.content, block.content[1].t, block.content[1])
-- TODO: instead of just processing the first item, I think we need to loop through all of them..
local firstInline = block.content[1]
-- TODO: Should this if run if firstInline.t is Str or Para, or just Str? Maybe needs to also handle RawInline now since we refactored..
if (firstInline.t == "RawInline" or firstInline.t == "Str" or firstInline.t == "Para") and firstInline.text == bullet_marker then
-- TODO: This works for indenting the first item, but when we have a nested BlockQuote or similar, it's following lines don't get indented properly
block.content = pandoc.List{
pandoc.Str(" "),
} .. block.content
else
block.content = pandoc.List{
-- pandoc.Str(bullet_marker .. " "),
pandoc.RawInline("markdown", bullet_marker),
pandoc.Space(),
} .. block.content
end
updateLastBlock(block)
elseif (block.t == "BlockQuote") then
-- TODO: instead of just processing the first item, I think we need to loop through all of them..
local firstInline = block.content[1]
if not firstInline or (firstInline.t ~= "Para" and firstInline.t ~= "Plain") then
return block
end
local inlines = pandoc.List{
pandoc.RawInline("markdown", bullet_marker),
pandoc.Space(),
-- TODO: Is this useful for prefixing the blockquote lines? https://pandoc.org/lua-filters.html#pandoc.layout.prefixed
pandoc.RawInline("markdown", ">"),
pandoc.Space(),
}
for _, inline in ipairs(firstInline.content) do
if inline.t == "LineBreak" then
inlines = inlines .. {
inline, -- LineBreak
-- pandoc.Str(string.rep(" ", #bullet_marker)), -- TODO: move this to a global variable (bullet_marker_indent) if it works well?
-- TODO: Is this useful for indents? https://pandoc.org/lua-filters.html#pandoc.layout.after_break
-- TODO: Is this useful for indentation? https://pandoc.org/lua-filters.html#pandoc.layout.nest
pandoc.Str(bullet_marker_indent),
pandoc.Space(),
-- TODO: Is this useful for prefixing the blockquote lines? https://pandoc.org/lua-filters.html#pandoc.layout.prefixed
pandoc.RawInline("markdown", ">"),
pandoc.Space(),
}
else
inlines:insert(inline)
end
end
-- transformedBlocks:insert(pandoc.Para(inlines))
updateLastBlock(pandoc.Para(inlines))
elseif (block.t == "CodeBlock") then
local inlines = pandoc.List{
pandoc.RawInline("markdown", bullet_marker),
pandoc.Space(),
-- TODO: is this useful for wrapping the codeblock contents/etc? https://pandoc.org/lua-filters.html#pandoc.layout.inside
pandoc.RawInline("markdown", "```"),
-- block.identifier
-- ..
pandoc.LineBreak(),
-- pandoc.Str(string.rep(" ", #bullet_marker)),
-- TODO: Is this useful for indentation? https://pandoc.org/lua-filters.html#pandoc.layout.nest
pandoc.Str(bullet_marker_indent),
pandoc.Space(),
-- pandoc.RawInline("markdown", block.text:gsub("\n", "\n" .. string.rep(" ", #bullet_marker + 1))),
-- TODO: Is this useful for indentation? https://pandoc.org/lua-filters.html#pandoc.layout.nest
pandoc.RawInline("markdown", block.text:gsub("\n", "\n" .. bullet_marker_indent .. ' ')),
pandoc.LineBreak(),
-- pandoc.Str(string.rep(" ", #bullet_marker)),
-- TODO: Is this useful for indentation? https://pandoc.org/lua-filters.html#pandoc.layout.nest
pandoc.Str(bullet_marker_indent),
pandoc.Space(),
pandoc.RawInline("markdown", "```"),
}
updateLastBlock(pandoc.Para(inlines))
-- elseif (block.t == "BulletList") then
-- -- TODO: I'm not sure if we will ever actually get this since we transform bulletlist content, and I think we walk the AST depth first
-- debug("HIT A BULLETLIST")
-- -- return indent .. block
-- -- return pandoc.List:new{
-- -- transformedBlocks = transformedBlocks .. pandoc.List{
-- -- pandoc.Space(),
-- -- pandoc.Space(),
-- -- block,
-- -- }
-- -- TODO: Do we need to do anything here? Or just leave it alone as it will be processed by another pass through this function?
-- transformedBlocks:insert(block)
else
debug(" Handling else ('%s') block: %s, [1]: %s", block.t, block.content, block.content and block.content[1])
-- TODO?
-- return bullet .. block
-- return pandoc.List:new{
-- transformedBlocks = transformedBlocks .. pandoc.List{
-- pandoc.Str(bullet_marker),
-- pandoc.Space(),
-- block,
-- }
-- transformedBlocks:insert(pandoc.Str(bullet_marker))
-- transformedBlocks:insert(pandoc.Space())
-- TODO: This still isn't quite right as it doesn't return the item prefixed with the list marker on the same line..
transformedBlocks:insert(pandoc.List{
pandoc.RawInline("markdown", bullet_marker),
pandoc.Space(),
-- block,
})
transformedBlocks:insert(block)
-- updateLastBlock(pandoc.Para(inlines))
-- transformedBlocks:insert(pandoc.Plain{
-- pandoc.Str(bullet_marker),
-- block,
-- })
-- transformedBlocks:insert(pandoc.Inlines{
-- pandoc.Str(bullet_marker),
-- pandoc.Space(),
-- block,
-- })
end
-- return block
-- end)
-- end)
end
end
-- TODO: pandoc.type might be better to use here: https://pandoc.org/lua-filters.html#pandoc.utils.type
debug(" type(transformedBlocks): %s", type(transformedBlocks))
debug(" transformedBlocks: %s", transformedBlocks)
for k, block in ipairs(transformedBlocks) do
debug(" transformedBlocks Block %d of %d: %s %s: %s", k, #transformedBlocks, type(block), block.t, block)
end
-- local flattenedBlocks = pandoc.List(transformedBlocks):flatten()
-- debug(" type(flattenedBlocks): %s", type(flattenedBlocks))
-- debug(" flattenedBlocks: %s", flattenedBlocks)
debug("End of BulletList")
-- return bulletList
-- return pandoc.Blocks{
-- pandoc.Inlines{
-- pandoc.Str(bullet_marker),
-- pandoc.Space(),
-- }
-- }
return transformedBlocks
-- return pandoc.List(transformedBlocks):flatten()
end
return {
{Meta = Meta},
{BulletList = BulletList},
{Emph = Emph},
}
-- Filter to customize markdown output
--
-- Configurable via metadata options:
-- bullet-marker: character to use for bullet lists ('-' or '*')
-- emphasis-marker: character to use for emphasis ('_' or '*')
--
-- Configuration can be provided via:
-- YAML metadata in the document
-- Command line: --metadata bullet-marker='-' --metadata emphasis-marker='_'
-- Metadata file: --metadata-file config.yaml
--
-- Example:
-- pbpaste | pandoc -f html -t gfm --lua-filter pandoc-markdown-devalias-emph-only.lua --metadata emphasis-marker='_'
--
-- Ref:
-- https://gist.github.com/0xdevalias/794d1aa03c357425c4c9583d9edc0303#poc-pandoc---lua-filter-for-customizing-markdown-output
-- https://pandoc.org/lua-filters.html
-- https://pandoc.org/lua-filters.html#type-writeroptions
-- Default values if not specified in metadata
local debug_mode = false
local emphasis_marker = '_'
-- Helper function for debug logging
local function debug(msg, ...)
if debug_mode then
io.stderr:write(string.format("[DEBUG] " .. msg .. "\n", ...))
end
end
-- Helper function to safely extract values from Pandoc's metadata
-- Handles boolean values directly and converts other types appropriately
--
-- @param meta: The metadata table from Pandoc
-- @param key: The key to look up in the metadata
-- @param default: Value to return if key is not found
-- @return: Value from metadata (preserving boolean type) or default
function get_metadata_value(meta, key, default)
local value = meta[key]
if value == nil then
return default
end
if type(value) == "boolean" then
return value
end
if type(value) == "table" and value.text ~= nil then
return value.text
end
return tostring(value)
end
-- Processes document metadata to configure the filter
-- Called by Pandoc at the start of document processing
--
-- @param meta: The document's metadata table
-- @return: The (possibly modified) metadata table
function Meta(meta)
-- Read configuration from metadata, falling back to defaults if not specified
debug_mode = get_metadata_value(meta, 'debug', debug_mode) == true
emphasis_marker = get_metadata_value(meta, 'emphasis-marker', emphasis_marker)
debug("Initialized with FORMAT: %s", FORMAT)
debug("emphasis-marker: %s", emphasis_marker)
debug("debug-mode: %s", debug_mode)
return meta
end
-- Processes emphasis elements (italic text) by converting them to raw format
--
-- @param el: The emphasis element (pandoc AST element)
-- @return: A RawInline element with explicitly formatted emphasis markers
function Emph(el)
-- Convert the inline content to a plain string
local text = pandoc.utils.stringify(el)
-- Build the new string, wrapping content with the custom marker
local wrapped = emphasis_marker .. text .. emphasis_marker
-- Return as raw Markdown (or another format if you prefer)
return pandoc.RawInline("markdown", wrapped)
end
return {
{Meta = Meta},
{Emph = Emph},
}
-- Filter to customize markdown output
--
-- Example:
-- pbpaste | pandoc -f html -t gfm --lua-filter pandoc-markdown-devalias-emph-static.lua
--
-- Ref:
-- https://gist.github.com/0xdevalias/794d1aa03c357425c4c9583d9edc0303#poc-pandoc---lua-filter-for-customizing-markdown-output
-- https://pandoc.org/lua-filters.html
-- https://pandoc.org/lua-filters.html#type-writeroptions
-- Processes emphasis elements (italic text) by converting them to raw format
--
-- @param el: The emphasis element (pandoc AST element)
-- @return: A RawInline element with explicitly formatted emphasis markers
function Emph(el)
-- Convert the inline content to a plain string
local text = pandoc.utils.stringify(el)
-- Build the new string, wrapping content with the custom marker
local wrapped = emphasis_marker .. text .. emphasis_marker
-- Return as raw Markdown (or another format if you prefer)
return pandoc.RawInline("markdown", wrapped)
end
return {
{Emph = Emph},
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment