Skip to content

Instantly share code, notes, and snippets.

@tslmy
Created July 14, 2016 15:03
Show Gist options
  • Save tslmy/5f1c57aee981f5acbf33e4ffaecbef76 to your computer and use it in GitHub Desktop.
Save tslmy/5f1c57aee981f5acbf33e4ffaecbef76 to your computer and use it in GitHub Desktop.
Save multiple webpages in individual Markdown files
import requests, re
with open('url.txt','r') as URLlist:
for url in URLlist:
r = requests.get('http://heckyesmarkdown.com/go/?read=1&preview=0&showframe=0&u='+url)
lines = r.text.split('\n')
title = ''
for line in lines:
if line.startswith('# '):
title = line[2:]
break
if title == '' :
print('[ERROR] Failed to access',url)
#print(' '+'\n '.join(lines[:20])+'\n')
else:
print('Processing',title,'...')
#print(' '+'\n '.join(lines[:5])+'\n')
with open(title+'.md', 'w') as f:
f.write(r.text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment