A lot of important government documents are created and saved in Microsoft Word (*.docx). But Microsoft Word is a proprietary format, and it's not really useful for presenting documents on the web. So, I wanted to find a way to convert a .docx file into markdown.
On a mac you can use homebrew by running the command brew install pandoc
.
As it turns out, there are several open-source tools that allow for conversion between file types. Pandoc is one of them, and it's powerful. In fact, pandoc's website says "If you need to convert files from one markup format into another, pandoc is your swiss-army knife." Pandoc can convert from markdown into .docx, and it also works in the other direction.
The bash script below will take an existing .docx file, convert it to markdown, and export all media in the word doc to a sub folder, and update the markdown links to these relative paths. In addition, it will use strict github flavored markdown styling, for use with Github or Cisco DevNet PubHub publishing tools.
To use:
- Installed pandocs with
brew install pandoc
on a Mac. Windows users are on your own to figure out installation. - Download the bash script below
- Run it as such
./docx2md.sh filename
- Do not pass the file name extension, and it must be in the same folder as the executable.