Last active
January 3, 2016 16:39
-
-
Save stoft/8490799 to your computer and use it in GitHub Desktop.
Short python script to recursively process html files with mark2text.py (https://github.com/aaronsw/html2text) thus converting them to MD docs instead (e.g. a Tomboy notes extracted collection). Written for OSX, YMMV.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import subprocess | |
import sys | |
import string | |
import re | |
for dir, subdir, files in os.walk( sys.argv[1] ): | |
for filename in files: | |
# ignore OSX fs file | |
if( filename != ".DS_Store"): | |
# remove all special chars from filename if there are any, | |
# and clip the file extension | |
f = re.sub('\W', '', string.split(filename, '.')[0]) + '.md' | |
# open the output file and pipe it to the stdout of html2text | |
with open(dir + '/' + f, 'w') as fout: | |
cmd = ["python", "html2text.py", dir + '/' + filename] | |
subprocess.call(cmd, stdout=fout) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment