Skip to content

Instantly share code, notes, and snippets.

@da-moon
Forked from plembo/pandocdocx2md.md
Created October 18, 2024 11:16
Show Gist options
  • Save da-moon/3d9f6beb56a1954435e9ad96fb931153 to your computer and use it in GitHub Desktop.
Save da-moon/3d9f6beb56a1954435e9ad96fb931153 to your computer and use it in GitHub Desktop.
Convert docx to markdown with pandoc

Convert Word documents to markdown with pandoc

I use pandoc to convert masses of Word documents to markdown. Still working on a generic script, but for now here's the "gist" of what I type into the terminal:

$ myfilename="example"
$ pandoc \
-t markdown_strict \
--extract-media='./attachments/$myfilename' \
$myfilename.docx \
-o $myfilename.md

Pandoc markdown is nice, but with Word documents it often adds odd things in translation. Stick to markdown_strict to avoid that.

I try to organize media (images, etc) embedded in documents under an attachments subdirectory with folders named for each file. This helps avoid "collision" between media file names and makes conversion out of markdown into other formats (HTML, PDF) less messy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment