Created
July 24, 2022 13:06
-
-
Save thomasaarholt/4ff0dc09bd566803a84005bcfeb29ed0 to your computer and use it in GitHub Desktop.
Train and test split function for polars dataframes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def train_test_split( | |
df: pl.DataFrame, train_fraction: float = 0.75 | |
) -> Tuple[pl.DataFrame, pl.DataFrame]: | |
"""Split polars dataframe into two sets. | |
Args: | |
df (pl.DataFrame): Dataframe to split | |
train_fraction (float, optional): Fraction that goes to train. Defaults to 0.75. | |
Returns: | |
Tuple[pl.DataFrame, pl.DataFrame]: Tuple of train and test dataframes | |
""" | |
df = df.with_column(pl.all().shuffle(seed=1)) | |
split_index = int(train_fraction * len(df)) | |
df_train = df[:split_index] | |
df_test = df[split_index:] | |
return (df_train, df_test) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment