Created
July 4, 2018 20:41
-
-
Save addywaddy/b68e9d2c27008f4e03853750a7f26508 to your computer and use it in GitHub Desktop.
Extract plain text from MS Word docx files
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
unzip -p some.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I needed this to diff two word files. I first saved them as text using the above commands and then: