Created
March 31, 2013 15:56
-
-
Save nsbingham/5281082 to your computer and use it in GitHub Desktop.
Clean up a HTML generated by Word with HTML Tidy on OSX
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Install tidy | |
brew install tidy | |
# Export the Word doc as HTML | |
# Create a config file named tidy-config.txt like below | |
# Find more at http://tidy.sourceforge.net/ | |
tidy -config tidy-config.txt -o cleaned.html -i dirty.htm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
word-2000: yes | |
bare: yes | |
clean: yes | |
drop-empty-paras: yes | |
drop-font-tags: yes | |
join-styles: yes | |
output-xhtml: yes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment