Skip to content

Instantly share code, notes, and snippets.

@language-engineering
Created September 25, 2012 09:42
Show Gist options
  • Select an option

  • Save language-engineering/3780915 to your computer and use it in GitHub Desktop.

Select an option

Save language-engineering/3780915 to your computer and use it in GitHub Desktop.
from random import sample
def split_data_random(data, ratio=0.8):
'''
Split data into two lists. With ratio=0.8, the first list
will be 80% of the size of the original data, and the
second will be 20%. The items in each list will be
randomly assigned. Ideally "data" is a list.
'''
n = len(data)
train_indices = sample(xrange(n), int(n*ratio))
test_indices = list(set(xrange(n)) - set(train_indices))
training_data = [data[i] for i in train_indices]
testing_data = [data[i] for i in test_indices]
return (training_data, testing_data)
#Example usage: simply split some data in half randomly:
data1,data2 = split_data_random(data,ratio=0.5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment