Data, Data, Data: Thousands of Public Data Sources
Machine Learning Datasets - Sources for specific Machine Learning datasets
UCI Machine Learning Repository: Data Sets
USA 2011 Car crash data - The dataset is collection of data about people involved in car accidents with fatalities, the final injuries, alcohol/drugs tests, and other relevant data about the accident and the person. Source: atality Analysis Reporting System (FARS) Encyclopedia
Environmental Hazard Rank - The EDR Environmental Hazard Ranking System depicts the relative environmental health of any U.S. ZIP code based on an advanced analysis of its environmental issues. It uses state-of-the-art geographic information system to parse data from NEDIS™, EDR's proprietary master database which contains more than 3.1 billion records of potential and real environmental hazards culled from over 1,400 continually updated databases. The EDR Environmental Hazard Ranking System uses an advanced scoring methodology to assign points to environmental records based on their hazard level and approximate cleanup cost. The results are then aggregated by ZIP code to provide you with a rank so you can see how the ZIP code you're interested in stacks up.
Worldwide Historical Weather Data
There are various machine learning challenges that offer data for their challenges. Often this data remains available even after the challenge is closed.
Commercial marketplaces and non-commercial data hubs
http://bitly.com/bundles/bigmlcom/3
Infochimps
Window Azure Marketplace (Free Datasets)