Skip to content

Instantly share code, notes, and snippets.

@pablox-cl
Created November 9, 2010 00:59
Show Gist options
  • Save pablox-cl/668552 to your computer and use it in GitHub Desktop.
Save pablox-cl/668552 to your computer and use it in GitHub Desktop.
Bash script to convert all the html files in a directory to text files
#!/usr/bin/env bash
# (C) 2010 - Pablo Olmos de Aguilera Corradini <pablo{at)glatelier.org>
# Under GPL v3. See http://www.gnu.org/licenses/gpl-3.0.html
# requires html2text.py
wget -q http://www.aaronsw.com/2002/html2text/html2text.py
for file in *.html; do
name=$(basename $file)
name=${file%.*}
python2 html2text.py ${file} > ${name}.txt
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment