Skip to content

Instantly share code, notes, and snippets.

@pvgenuchten
Last active August 30, 2023 13:42
Show Gist options
  • Save pvgenuchten/223eba8fb51659076ba52e2c8dcce29d to your computer and use it in GitHub Desktop.
Save pvgenuchten/223eba8fb51659076ba52e2c8dcce29d to your computer and use it in GitHub Desktop.
markdown from word docx (not) using pandoc
# I tried markdown addin by Britva from https://appsource.microsoft.com/en-us/product/office/WA200002866, but it doesn't do tables
# By default pandoc renders tables using pandoc markdown conventions, which breaks up cell content on multiple lines (breaking in the middle of a hyperlink, rendering invalid markdown)
pandoc -s foo.docx -t markdown -o foo.md
# Seems there is a number of table plugins that enable you to finetune the behaviour
pandoc -s foo.docx -t markdown-simple_tables-multiline_tables-grid_tables -o foo.md
# In many scenario's (seems related to the column width being more then a treshold) this approach breaks and pandoc
# falls back to render a html table, instead of markdown
# This behaviour can be skipped by setting a long column width + wrop=none, but it gives unexpected artifacts
pandoc -s foo.docx -t markdown-simple_tables-multiline_tables-grid_tables --wrap=none --column=999 -o foo.md
# So probably better to use gfm (github flavoured markdown)? Same problems as above...
pandoc -s foo.docx -t gfm -o foo.md
# Giving up on using pandoc for now. Moving to https://github.com/benbalter/word-to-markdown based on Ruby and libreoffice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment