Skip to content

Instantly share code, notes, and snippets.

@olologin
Created September 17, 2015 17:42
Show Gist options
  • Save olologin/525858407fd3b5830d30 to your computer and use it in GitHub Desktop.
Save olologin/525858407fd3b5830d30 to your computer and use it in GitHub Desktop.
parsing with pandas
import pandas as pd
from sklearn import linear_model
file='file1.csv'
df = pd.read_csv(file,
header=None,
names=['company_id', 'state', 'profit', 'attr1', 'attr2', 'attr3'])
gb = df.groupby(['company_id', 'state'])
for (company_id, state), indicies in gb.groups.items():
test_set_feature_list = df.loc[indicies[-2:],'attr1':]
test_set_label_list = df.loc[indicies[-2:],'profit']
training_set_feature_list = df.loc[indicies[:-2],'attr1':]
training_set_label_list = df.loc[indicies[:-2],'profit']
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(training_set_feature_list, training_set_label_list)
print (regr.predict(test_set_feature_list))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment