Skip to content

Instantly share code, notes, and snippets.

@cjwinchester
Created October 9, 2017 22:07
Show Gist options
  • Save cjwinchester/0f89bbeedef6aecc04aba860128d8ac1 to your computer and use it in GitHub Desktop.
Save cjwinchester/0f89bbeedef6aecc04aba860128d8ac1 to your computer and use it in GitHub Desktop.
Quick and dirty "rank Texas death row media witnesses by frequency" script.
from collections import Counter
import requests
from bs4 import BeautifulSoup
def j_formatter(reporter):
name, affil = [x.strip() for x in reporter.split(',', 1)]
# if this were a real thing, would obv want more rigorous cleaning
if name == 'Michael Graczyk':
name = 'Mike Graczyk'
if affil == 'Huntsville Item':
affil = 'The Hunstville Item'
return name + ' - ' + affil
URL = 'https://www.tdcj.state.tx.us/death_row/dr_media_witness_list.html'
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
journos = Counter()
for row in rows[1:]:
reporters = row.find_all('td')[6].text.split(';')
for human in [x for x in reporters if x.strip()]:
journos[j_formatter(human)] += 1
for i, journo in enumerate(journos.most_common(20)):
print(str(i+1) + '.', journo[0] + ': ', journo[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment