Skip to content

Instantly share code, notes, and snippets.

@remi-or
Last active August 16, 2021 10:54
Show Gist options
  • Save remi-or/0104777c493cd00e577db3cfe92717d4 to your computer and use it in GitHub Desktop.
Save remi-or/0104777c493cd00e577db3cfe92717d4 to your computer and use it in GitHub Desktop.
Snippet for loading datasets
# This snippet requires you to install Hugging Face's datasets module
from datasets import load_dataset
import pandas as pd
Dataframe = pd.DataFrame({})
questions = load_dataset('squad')['train']['question'][:3000]
Dataframe = Dataframe.append(pd.DataFrame({'Text' : questions, 'Source' : 'squad'}))
questions = load_dataset('hotpot_qa', 'distractor')['train']['question'][:3000]
Dataframe = Dataframe.append(pd.DataFrame({'Text' : questions, 'Source' : 'hotpot'}))
questions = load_dataset('eli5')['train_eli5']['title'][:3000]
Dataframe = Dataframe.append(pd.DataFrame({'Text' : questions, 'Source' : 'eli5'}))
questions = load_dataset('imdb')['train']['text'][:3000]
Dataframe = Dataframe.append(pd.DataFrame({'Text' : questions, 'Source' : 'imdb'}))
Dataframe = Dataframe.reset_index().drop(columns='index')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment