Skip to content

Instantly share code, notes, and snippets.

@joaopcnogueira
Last active July 11, 2019 13:53
Show Gist options
  • Save joaopcnogueira/2a08dd360afa5a33f88492545435118b to your computer and use it in GitHub Desktop.
Save joaopcnogueira/2a08dd360afa5a33f88492545435118b to your computer and use it in GitHub Desktop.
Simple Example using GroupKFold with Cross-Validate
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GroupKFold
# Loading the data
iris = datasets.load_iris()
design_matrix = np.concatenate((iris['data'], iris['target'].reshape(150,1)), axis=1)
df = pd.DataFrame(design_matrix, columns = ['sepal_length', 'sepal_width',
'petal_length', 'petal_width',
'species'])
X = df.drop(columns='species', axis=1)
y = df.species
# Defining validation schema and the groups to split the data
group_kfold = GroupKFold(n_splits=3)
groups = df.species
# Picking the model
model = DecisionTreeClassifier()
# Training the model with a GroupKFold validation schema
results = cross_validate(model, X, y, cv = group_kfold, groups = groups, return_train_score = True)
print(results)
print("Accuracy: %.2f (%.2f)" %(results['test_score'].mean(), results['test_score'].std()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment