Skip to content

Instantly share code, notes, and snippets.

@RahulDas-dev
Last active August 8, 2022 10:41
Show Gist options
  • Save RahulDas-dev/e70fbac2eed35409b2c6232b46d5a64b to your computer and use it in GitHub Desktop.
Save RahulDas-dev/e70fbac2eed35409b2c6232b46d5a64b to your computer and use it in GitHub Desktop.
Spliting Pandas Dataframe into Train and Test Datasets
import math
import pandas
# Loading data into pandas Dataframe
dataset = pd.read_csv(dataset_path)
print(f'dataset {dataset.shape}')
dataset.head()
#Shuffling the Dataset
dataset = dataset.sample(frac=1,random_state=32).reset_index(drop=True)
#Test and Train Dataset Size Calculation, Considering 80%-20% split
total_size = dataset.shape[0]
train_size = math.ceil(total_size * 0.8) #Considering 80% 20% split
print(f'total_size {total_size}, train_size {train_size}, test_size {total_size - train_size}')
#Final Spliting into Train and Test
train_df = dataset[:train_size].copy()
test_df = dataset[train_size:].copy()
#Writing back to datset_dir path
train_df.to_csv(os.path.join(datset_dir,'train_data.csv'), index=False)
test_df.to_csv(os.path.join(datset_dir,'test_data.csv'), index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment