Last active
August 30, 2023 13:42
-
-
Save pvgenuchten/223eba8fb51659076ba52e2c8dcce29d to your computer and use it in GitHub Desktop.
markdown from word docx (not) using pandoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# I tried markdown addin by Britva from https://appsource.microsoft.com/en-us/product/office/WA200002866, but it doesn't do tables | |
# By default pandoc renders tables using pandoc markdown conventions, which breaks up cell content on multiple lines (breaking in the middle of a hyperlink, rendering invalid markdown) | |
pandoc -s foo.docx -t markdown -o foo.md | |
# Seems there is a number of table plugins that enable you to finetune the behaviour | |
pandoc -s foo.docx -t markdown-simple_tables-multiline_tables-grid_tables -o foo.md | |
# In many scenario's (seems related to the column width being more then a treshold) this approach breaks and pandoc | |
# falls back to render a html table, instead of markdown | |
# This behaviour can be skipped by setting a long column width + wrop=none, but it gives unexpected artifacts | |
pandoc -s foo.docx -t markdown-simple_tables-multiline_tables-grid_tables --wrap=none --column=999 -o foo.md | |
# So probably better to use gfm (github flavoured markdown)? Same problems as above... | |
pandoc -s foo.docx -t gfm -o foo.md | |
# Giving up on using pandoc for now. Moving to https://github.com/benbalter/word-to-markdown based on Ruby and libreoffice |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment