Last active
April 7, 2021 17:04
-
-
Save tdpearson/e803c8af4c60be58e6e310c37e740aea to your computer and use it in GitHub Desktop.
Extract comments from a MS Word .docx files using Command Line Tools
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # For background information, DOCX files are ZIP archives containing XML files. | |
| # I had a recent project that needed comments extracted from several MS Word documents. | |
| # This would have been painful to do manually - command line to the rescue! | |
| find . -name "*.docx" -exec sh -c 'unzip -p $1 word/comments.xml | xmllint -xpath "//text()" -' sh {} \; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment