Skip to content

Instantly share code, notes, and snippets.

@komly
Created September 30, 2015 20:05
Show Gist options
  • Save komly/dd46cd8cf4a6859dcc98 to your computer and use it in GitHub Desktop.
Save komly/dd46cd8cf4a6859dcc98 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
import re
from urllib.request import urlopen
url = 'http://hist.hse.ru/persons/'
data = urlopen(url).read().decode()
regexp = r'b-greetings__person_data small">[^>]+>([^<]+)</a><p>([^<]+)<'
with open('persons.txt', 'w') as f:
for name, job in re.findall(regexp, data):
if 'профессор' in job.lower() or 'доцент' in job.lower():
f.write(name + '\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment