Skip to content

Instantly share code, notes, and snippets.

@LouisdeBruijn
Last active March 11, 2020 13:52
Show Gist options
  • Select an option

  • Save LouisdeBruijn/b64bda87e72a0cc9df3ed51bb10205cb to your computer and use it in GitHub Desktop.

Select an option

Save LouisdeBruijn/b64bda87e72a0cc9df3ed51bb10205cb to your computer and use it in GitHub Desktop.
def shuffle_split(documents, labels, split):
"""Shuffle data to ensure random class distribution in train/test split."""
tuples = [[doc, label] for doc, label in zip(documents, labels)]
random.shuffle(tuples)
X, Y = zip(*tuples)
split_point = int(split*len(X))
Xtrain = X[:split_point]
Ytrain = Y[:split_point]
Xtest = X[split_point:]
Ytest = Y[split_point:]
return Xtrain, Xtest, Ytrain, Ytest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment