- https://ourworldindata.org
- https://datasetsearch.research.google.com
- http://image-net.org
- http://yann.lecun.com/exdb/mnist
- http://cocodataset.org (images in context)
- https://research.google.com/audioset/index.html
- https://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
- https://nijianmo.github.io/amazon/index.html (Amazon Review Data 2018)
- https://dumps.wikimedia.org
- http://aidemos.cs.toronto.edu/nds
- http://ai.stanford.edu/~amaas/data/sentiment
- https://nlp.stanford.edu/projects/glove
- https://quantumstat.com/dataset/dataset.html?fbclid=IwAR0rHDTS4tX4unoqJVeS10XQx429dGKUDUvJOIDwPa1MGYS_6CgulHMIfyI
- http://human-pose.mpi-inf.mpg.de
- http://www.cvpapers.com/datasets.html (CV Datasets on the web)
- https://data.world/datasets/health
- https://data.worldbank.org (World bank)
- https://www.who.int/gho/database/en (World health organization)
- https://www.google.com/publicdata/directory (Google data set)
- https://registry.opendata.aws (registry open data AWS)
- http://data.europa.eu/euodp/en/data (Europe open data)
- https://wiki.dbpedia.org (wiki dbpedia)
- https://www.yelp.com/dataset (yelp)
- https://data.unicef.org (UNICEF)
- https://www.kaggle.com/datasets (kaggle)
- https://archive.ics.uci.edu/ml/index.php (machine learning repositary)
- https://www.data.gov (US data)
- https://www.census.gov/data.html (US Census)
- https://healthdata.gov/search/type/dataset (health data)
- https://data.gov.uk
- https://data.gov.sa/ar/home
- https://datasource.kapsarc.org/pages/home
- http://data.imf.org/?sk=388DFA60-1D26-4ADE-B505-A05A558D9A42 (International monetary fund)
- https://www.visualdata.io
- http://insideairbnb.com/get-the-data.html
- https://ieee-dataport.org
- https://lionbridge.ai/datasets/12-best-audio-datasets-for-machine-learning
- https://www.dolthub.com/ (a repository and community for data collaboration)
- https://archive.ics.uci.edu/ml (UC Irvine Machine Learning Repository)
- http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
- https://www.mygreatlearning.com/blog/top-20-dataset-in-machine-learning
- https://www.nature.com/sdata/policies/repositories (Recommended Data Repositories | Scientific Data)
- https://docs.google.com/spreadsheets/d/1oFY_TX5QRFyAAu-nxeClnOFB1epSlSDWEHoMalvv0Qs/edit#gid=0
- https://www.artnome.com/art-data
- https://allisonhorst.github.io/palmerpenguins
- https://datasette.io/examples
- https://openneuro.org
- https://decode.mit.edu/projects/biked (A Dataset and Machine Learning Benchmarks for Data-Driven Bicycle Design)
- https://airbase.readthedocs.io
- https://github.com/ofirkris/Faces-datasets
- https://www.govdata.de (Das Datenportal für Deutschland)
- https://data.deutschebahn.com (Deutsche Bahn AG)
- https://www.eia.gov/opendata (U.S. Energy Information Administration)
- https://github.com/bundesAPI/deutschland
- https://www.crcv.ucf.edu/data/Selfie
- https://www.usgs.gov (United States Geological Survey)
- https://exposing.ai/datasets (Face Recognition)
- https://www.madronavl.com/launchable/launchable-datasets-to-kickstart-your-ai-startup-journey (Public Data Sources)
- https://dumps.wikimedia.org/ (Wikimedia Downloads)
- https://github.com/niderhoff/nlp-datasets (List of free/public domain datasets with text data for use in NLP)
- https://www.madronavl.com/launchable/public-data-sources-text (Public Text Data)
- https://www.fulltextarchive.com/ (Online Library of over 8000 free classic books)
- Awesome Public Datasets
Last active
May 2, 2025 07:23
-
-
Save thomd/42597f79b31ded78844bd7423759d9e7 to your computer and use it in GitHub Desktop.
Data for Machine Learning and Data Science #list #data #ml #api
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment