Forest Gregg
[email protected]
DataMade
http://datamade.us
```
Almost every website you go to is a view of some data that has been organized into tables. Web pages are fancy view of spreadsheets
* [Tiers Fusion Table](https://www.google.com/fusiontables/data?docid=11PNEL-A6MFtYLLGvgtHqK7K1Pm4viKiK9IHY0tYf#rows:id=1)
* [CPS Tiers](http://cpstiers.opencityapps.org/)
The means that sometimes we can go the other way. We can turn websites back into tables of data. This is called web scraping.
* [Illinois State Board Elections](http://www.elections.il.gov/)
* [Campaign Committee Page](http://www.elections.il.gov/CampaignDisclosure/CommitteeDetail.aspx?id=4410)
* [Election Money](http://electionmoney.org/)
* [Current Cash Position](http://illinoiselectiondata.com/?p=265)
Two reasons you might scrape?
- You tried asking, got ignored or rejected, don't want to hire a lawyer.
- The data changes often and you need current data on ongoing basis.
If one of these don't apply, it's easier to just ask.
Reasons why you still might not scrape?
- It's illegal. Specifically, it might violate the terms of service of a website. This is a contract that you implicitly agree to by interacting with a website that limits how you can use the website. The law hear is often murky. It is *much* murkier for government websites. I don't ever violate terms of services.
- It's expensive. It will typically cost $3-5K to hire someone to write a good scraper for a complicated site. This will often just be an upfront cost, and if you have an ongoing use, it can be attractive.
Last active
August 29, 2015 14:06
-
-
Save fgregg/1c39d7a9c1f4edc38c8a to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment