#Public Data Sources
##Legislative
- United States on github They have a great US module that has state abbrevs, names, etc. O'Reilly article about the project
- The State Decoded
- Legislative Documents in XML at the United States House of Representatives
- US Government Web Services and XML Data Sources
- US House roll call votes modify the URL for year and series of vote
- Missouri General Assembly Statutes
- Missour House of Representatives current session XML export
- Virgina Decoded
- Canadian Parliament Member Expenditures modify the URL for your Fiscal Year
- Health Indicators
- CDC data sets
- State of Missouri data sets
- Missouri Sunshine Law Offender data file
- Missouri Sex Offender Registry
- Open Data KC
- Enron email data set
- KDD Nuggets list of data sets
Research Quality Data Sets by Hilary Mason
- Lending Club Loan Data
- SMS Spam Collection
- Pew Research Internet and Tech data sets
- Flickr personal taxonomies
- Yahoo Data for Researchers
- DBLP Computer Science Bibliography
- ICWSM Spinnr Challenge 2011 dataset
- Quantum Chaotic Thoughts: Facebook100 Data Set
- Public Data Sets on Amazon Web Services (http://AWS)
- The ClueWeb09 Dataset
- Census Bureau Home Page
- Data | The World Bank
- ImageNet
- What is Twitter, a Social Network or a News Media? - WWW'10
- dotbot | DotNetDotCom.org
- arXiv.org help - arXiv Bulk Data Access - Amazon S3
- YouTube Dataset
- Face Recognition Homepage - Databases
- Pajek datasets
- UCI Network Data Repository
- Datasets for "The Elements of Statistical Learning"
- Enron Email Dataset
- MovieLens Data Sets | GroupLens Research
- Translation Task - EMNLP 2011 Sixth Workshop on Statistical Machine Translation
- Project Gutenberg
- About WordNet - WordNet - About WordNet
- Aligned Hansards of the 36th Parliament of Canada
- CRCNS - Collaborative Research in Computational Neuroscience - Data sharing
- USENET corpus
- UniGene
- ChEMBLdb hmason: Great dataset for Cheminformatics
- UCI Machine Learning Repository
- Gene Expression Omnibus (http://GEO) Main page
- Social Science Data
- IMDB dataset
- Stanford Large Network Dataset Collection
- Google Books n-gram dataset
- Million Song Dataset | scaling MIR research
- Belly Button Biodiversity 2.0
- Datasets - Modeling Online Auctions
- 2gb of photos of cats was at http://137.189.35.203/WebUI/CatDatabase/catData.html
- Sharing PyPi/Maven dependency data « RTFB
- Click Dataset | Center for Complex Networks and Systems Research
- The Electric Rice Cooker — One year of deleted weibos archive
- Registered meteorites that has impacted on Earth visualized hmason: 74MB excel file of registered meteors
- GeoJSON files for real-time Virginia transportation data.
- NYPD Crash Data Band-Aid
- 11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts | Research Blog
- Foursquare Data Set - University of Minnesota
- Big data set - 3.5 billion web pages - made available for all of us - Big Data News hmason: common crawl data for all
- Data.Seattle.Gov | Seattle's Data Site
- New Crawl Data Available! | CommonCrawl
- Detailed data on pass rates, race, and gender for 2013
- Data Download
- SNAP: Web data: Amazon reviews