Skip to content

Instantly share code, notes, and snippets.

@stoft
Last active January 3, 2016 16:39
Show Gist options
  • Save stoft/8490799 to your computer and use it in GitHub Desktop.
Save stoft/8490799 to your computer and use it in GitHub Desktop.
Short python script to recursively process html files with mark2text.py (https://github.com/aaronsw/html2text) thus converting them to MD docs instead (e.g. a Tomboy notes extracted collection). Written for OSX, YMMV.
import os
import subprocess
import sys
import string
import re
for dir, subdir, files in os.walk( sys.argv[1] ):
for filename in files:
# ignore OSX fs file
if( filename != ".DS_Store"):
# remove all special chars from filename if there are any,
# and clip the file extension
f = re.sub('\W', '', string.split(filename, '.')[0]) + '.md'
# open the output file and pipe it to the stdout of html2text
with open(dir + '/' + f, 'w') as fout:
cmd = ["python", "html2text.py", dir + '/' + filename]
subprocess.call(cmd, stdout=fout)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment