- GLUE : General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. This is a benchmark of nine sentence- or sentence-pair language understanding tasks.
- decaNLP: Natural Language Decathlon, a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. By requiring a single system to perform ten disparate natural language tasks, decaNLP offers a unique setting for multitask, transfer, and continual learning.
- Nearest Neighbors Benchmarks - Collection of datasets pre-split into train/test with ground truth data for top 100 neighbors. Example Datasets: Fashion-MNIST, MNIST, Last.fm. Evaluated algorithms are Vespa, OpenSearch KNN, pyNNDescent, FAISS, hnswlib(nmslib), Annoy, Milvus
- HF Datasets for NLP - ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
- fastai datasets external - Contains datasets for medical imaging, audio, image localization, nlp, image classification etc.