Skip to content

Instantly share code, notes, and snippets.

@tdpearson
Last active April 7, 2021 17:04
Show Gist options
  • Select an option

  • Save tdpearson/e803c8af4c60be58e6e310c37e740aea to your computer and use it in GitHub Desktop.

Select an option

Save tdpearson/e803c8af4c60be58e6e310c37e740aea to your computer and use it in GitHub Desktop.
Extract comments from a MS Word .docx files using Command Line Tools
# For background information, DOCX files are ZIP archives containing XML files.
# I had a recent project that needed comments extracted from several MS Word documents.
# This would have been painful to do manually - command line to the rescue!
find . -name "*.docx" -exec sh -c 'unzip -p $1 word/comments.xml | xmllint -xpath "//text()" -' sh {} \;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment