Skip to content

Instantly share code, notes, and snippets.

@maybemkl
Created September 6, 2021 23:00
Show Gist options
  • Save maybemkl/d9be15bcabadaa19d2ca50c87b59a92e to your computer and use it in GitHub Desktop.
Save maybemkl/d9be15bcabadaa19d2ca50c87b59a92e to your computer and use it in GitHub Desktop.
Remove markdown wiki-link brackets during pandoc exports
@chrisgrieser
Copy link

I think it's this package, although i am not entirely certain how to appropriately install it 🤔
https://pypi.org/project/pandocfilters/

@racng
Copy link

racng commented Sep 19, 2022

You can install it using pip, but I tried installing it with conda instead in an isolated environment just in case.

conda create -n pandoc
conda install -c conda-forge pandocfilteres
conda activate pandoc

Before any filtering is done, pandac parses markdown file into abstract syntax tree (AST). I took a look at what the tree looks like for a simple markdown with a single line: [[@citekey]]. The string is actually broken into three blocks: [, [@citekey], and ]. So there would be no string that contains [[ or ]], therefore this pandocfilter script didn't work. Similary for the lua filters, replacing [[ or ]] doesn't work.

Both pandocfilter and lua filter would work if we replace [ and ] with ''.

Here is what the AST looks like

List of 3
 |-pandoc-api-version:List of 4
 |  |-: int 1
 |  |-: int 22
 |  |-: int 2
 |  |-: int 1
 |-meta              : Named list()
 |-blocks            :List of 1
    |-:List of 2
       |-t: chr "Para"
       |-c:List of 3
          |-:List of 2
          |  |-t: chr "Str"
          |  |-c: chr "["
          |-:List of 2
          |  |-t: chr "Cite"
          |  |-c:List of 2
          |     |-:List of 1
          |     |  |-:List of 6
          |     |     |-citationId     : chr "citekey"
          |     |     |-citationPrefix : list()
          |     |     |-citationSuffix : list()
          |     |     |-citationMode   :List of 1
          |     |     |  |-t: chr "NormalCitation"
          |     |     |-citationNoteNum: int 1
          |     |     |-citationHash   : int 0
          |     |-:List of 1
          |        |-:List of 2
          |           |-t: chr "Str"
          |           |-c: chr "[@citekey]"
          |-:List of 2
             |-t: chr "Str"
             |-c: chr "]"

@aravindk100
Copy link

Thanks for this great insight @racng . I was able to make this change and get it work except I noticed that the back end of the link did not get filtered correctly.
It went from [[name]] to name] . I found it odd that it was able to replace [[ but only one of the ]

This is the modified code I am using,
#!/usr/bin/env python3

from pandocfilters import toJSONFilter, Str
import re

def replace(key, value, format, meta):
if key == 'Str':
if '[' in value:
new_value = value.replace('[', '')
return Str(new_value)
if ']' in value:
new_value = value.replace(']', '')
return Str(new_value)

if name == 'main':
toJSONFilter(replace)

@balaji-dutt
Copy link

balaji-dutt commented Dec 21, 2022

Thanks for the original filter code @maybemkl! I was hitting the same problem as @aravindk100 in that the filter would not find the closing ]] characters, so I modified the script to take advantage of some newer Python 3.8 features which also greatly simplifies the code. Here's my version:

#!/usr/bin/env python3

from pandocfilters import toJSONFilter, Str
import re

def replace(key, value, format, meta):
    if key == 'Str':
        if match := re.search('\[\[(.+)\]\]',value,re.IGNORECASE):
           new_value = match.group(1)
           return Str(new_value)

if __name__ == '__main__':
    toJSONFilter(replace)

@archifont
Copy link

Thank you all, this is really helpful, especially when exporting linked notes from Obsidian through pandoc!

Apparently there seemed to be an invalid escape sequence. The regex pattern '\[\[(.+)\]\]' contains backslashes (\). In Python strings, \[ and \] could be misinterpreted as escape sequences. A raw string (r"") tells Python to ignore escape sequences, so \[ and \] are treated as literal brackets instead of escape sequences.

This is the improved version (with help of ChatGTP):

#!/usr/bin/env python3

from pandocfilters import toJSONFilter, Str
import re

def replace(key, value, format, meta):
    if key == 'Str':
        if match := re.search(r'\[\[(.+)\]\]', value, re.IGNORECASE):
            new_value = match.group(1)
            return Str(new_value)

if __name__ == '__main__':
    toJSONFilter(replace)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment