Matthew Phillips phillipsm

Engineer at Moon Creative Lab in Palo Alto, California. 🎈

phillipsm / gist:8afdcf295b90691810e5

Created July 14, 2015 20:36

Python 3 version of Journalist's Resource Tip Sheet web scraping script

	import requests, time
	from bs4 import BeautifulSoup

	# We've now imported the two packages that will do the heavy lifting
	# for us, reqeusts and BeautifulSoup

	# This is the URL that lists the current inmates
	# Should this URL go away, and archive is available at
	# http://perma.cc/2HZR-N38X
	url_to_scrape = 'http://apps2.polkcountyiowa.gov/inmatesontheweb/'

phillipsm / gist:650a35c810b969f3e5bc

Created July 6, 2015 20:36

Example of sleep

time.sleep(1)

phillipsm / gist:3d61b7b38317a962ebd2

Created July 6, 2015 19:57

Output of script

	$ python process.py
	CRAIG ELTON GILLEN, 20
	White Male from SPRING HILL, IA
	Booked at 7/6/2015 11:51 AM

	JEREMY MONTEZ AMERISON SMITH, 27
	Black Male from CLIVE, IA
	Booked at 7/6/2015 11:45 AM

	.

phillipsm / gist:29d4cb4addb5c5a21ae7

Created June 24, 2015 20:22

Sum and print aggregations

	inmate_cities = {}

	for inmate in inmates:
	if inmate['city'] in inmate_cities:
	inmate_cities[inmate['city']] += 1
	else:
	inmate_cities[inmate['city']] = 1

	print inmate_cities

phillipsm / gist:1f272a7caec08e44df2f

Last active August 29, 2015 14:23

	inmates = []

	for inmate_link in inmates_links[:10]:
	r = requests.get(inmate_link)
	soup = BeautifulSoup(r.text)

	inmate_details = {}

	inmate_profile_rows = soup.select("#inmateProfile tr")
	inmate_details['age'] = inmate_profile_rows[0].findAll('td')[0].text.strip()

phillipsm / gist:7199f931a2de6787c0b6

Created June 24, 2015 20:16

Build list of inmates

	url_to_scrape = 'http://apps2.polkcountyiowa.gov/inmatesontheweb/'

	r = requests.get(url_to_scrape)

	soup = BeautifulSoup(r.text)

	inmates_links = []

	for table_row in soup.select(".inmatesList tr"):
	table_cells = table_row.findAll('td')

phillipsm / gist:2bdb5f622cbabe107c5b

Created June 24, 2015 20:14

Import our packages

	import requests
	from bs4 import BeautifulSoup

phillipsm / gist:404780e419c49a5b62a8

Last active April 22, 2024 11:55

Inmate scraping script

	import requests
	from bs4 import BeautifulSoup
	import time

	# We've now imported the two packages that will do the heavy lifting
	# for us, reqeusts and BeautifulSoup

	# This is the URL that lists the current inmates
	# Should this URL go away, and archive is available at
	# http://perma.cc/2HZR-N38X

phillipsm / gist:c832c825c994735b31fe

Last active August 29, 2015 14:21

All material for dgmde15

All material used for dgmde15

still dumping material in here

phillipsm / gist:0ed98b2585f0ada5a769

Last active February 7, 2025 19:55

Example of parsing a table using BeautifulSoup and requests in Python

	import requests
	from bs4 import BeautifulSoup

	# We've now imported the two packages that will do the heavy lifting
	# for us, reqeusts and BeautifulSoup

	# Let's put the URL of the page we want to scrape in a variable
	# so that our code down below can be a little cleaner
	url_to_scrape = 'http://apps2.polkcountyiowa.gov/inmatesontheweb/'