Skip to content

Instantly share code, notes, and snippets.

@mrm8488
Forked from kabirahuja2431/iterable_dataloader_v0.py
Created February 23, 2020 03:22
Show Gist options
  • Save mrm8488/3430af1120ef07b5207402fb4cae5d56 to your computer and use it in GitHub Desktop.
Save mrm8488/3430af1120ef07b5207402fb4cae5d56 to your computer and use it in GitHub Desktop.
#Creating the iterable dataset object
dataset = CustomIterableDataset('path_to/somefile')
#Creating the dataloader
dataloader = DataLoader(dataset, batch_size = 64)
for data in dataloader:
#Data is a list containing 64 (=batch_size) consecutive lines of the file
print(len(data)) #[64,]
#We still need to separate the text and labels from each other and preprocess the text
X, y = []
for i in range(len(data)):
text, label = data[i].split(',')
text = preprocess(text) #Defined somewhere outside
X.append(text)
y.append(label)
### Do something with X and y
###
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment