-
-
Save ocoyawale/273ed36bee1cb5c0992c0e18fae205c0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's start by importing what we need, and reading in the data. Note that the categorical variables have been encoded. For brevity, I already split the data into train and test sets." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The features in the data set are:\n", | |
"\n", | |
"1. Unique_Orders: Number of unique orders by the customer in the given time period\n", | |
"2. Recent_Purchase: Most recent purchase (in dollars)\n", | |
"3. Recent_Return: Most recent return (in dollars)\n", | |
"4. Total_Purchased: Total lifetime purchase amount\n", | |
"5. Total_Returned: Total lifetime return amount\n", | |
"6. Recent_Seat: How many tickets/seats they last bought\n", | |
"7. Recent_Sub_Price: How much their last subscription cost, if anything\n", | |
"8. Total_Seats: Total lifetime seats they've bought\n", | |
"9. Total_Paid: Total amount they've paid\n", | |
"10. Num_Moves: Number of times they've moved home addresses\n", | |
"11. Solicitor_Code: Most recent solicitor (i.e. was it Alice, Bob, the web API, etc)\n", | |
"12. Prior_Code: Their priority code\n", | |
"13. Country_Code: The code of their home country" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd\n", | |
"import numpy as np\n", | |
"from sklearn.linear_model import LogisticRegression\n", | |
"from sklearn import svm\n", | |
"from matplotlib import pyplot as plt\n", | |
"% matplotlib inline\n", | |
"from sklearn.metrics import accuracy_score, roc_auc_score, f1_score, precision_score, recall_score" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"test = pd.read_csv(\"test.csv\")\n", | |
"train = pd.read_csv(\"train.csv\")\n", | |
"predictors = [\"Unique_Orders\",\"Recent_Purchase\",\"Recent_Return\",\"Total_Purchased\",\n", | |
" \"Total_Returned\",\"Recent_Seat\",\"Recent_Sub_Price\",\"Total_Seats\",\n", | |
" \"Total_Paid\",\"Num_Moves\",\"Solicitor_Code\",\"Prior_Code\", \"Country_Code\"]\n", | |
"X_train = train[predictors]\n", | |
"y_train = train[\"Churn?\"]\n", | |
"X_test = test[predictors]\n", | |
"y_test = test[\"Churn?\"] " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Next let's define our error metrics. We'll look at AUC (ROC), precision, recall, F1 score, and just for fun, accuracy." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def error_metrics(y_test, predictions, model): \n", | |
" print(\"AUC: \", roc_auc_score(y_test, predictions))\n", | |
" print(\"Precision: \",precision_score(y_test, predictions, average=\"macro\"))\n", | |
" print(\"Recall: \",recall_score(y_test, predictions, average=\"macro\")) \n", | |
" print(\"F1 Score: \",f1_score(y_test, predictions, average=\"macro\"))\n", | |
" print(\"Accuracy: \", model.score(X_test, y_test))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we'll do some simple predictions. Let's choose C = 0.1 for both our SVM and logistic regression models." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"SVM with C = 0.1.\n", | |
"AUC: 0.666454081633\n", | |
"Precision: 0.670731707317\n", | |
"Recall: 0.666454081633\n", | |
"F1 Score: 0.666705002875\n", | |
"Accuracy: 0.671428571429\n" | |
] | |
} | |
], | |
"source": [ | |
"print(\"SVM with C = 0.1.\")\n", | |
"svm_model = svm.SVC(kernel = \"linear\", C=0.1, probability = True).fit(X_train,y_train) \n", | |
"predictions = svm_model.predict(X_test)\n", | |
"error_metrics(y_test, predictions,svm_model)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Logistic Regression with C = 0.1.\n", | |
"AUC: 0.668367346939\n", | |
"Precision: 0.679487179487\n", | |
"Recall: 0.668367346939\n", | |
"F1 Score: 0.667473919523\n", | |
"Accuracy: 0.67619047619\n" | |
] | |
} | |
], | |
"source": [ | |
"print(\"Logistic Regression with C = 0.1.\")\n", | |
"lr_model = LogisticRegression(C=0.1).fit(X_train,y_train) \n", | |
"predictions = lr_model.predict(X_test)\n", | |
"error_metrics(y_test, predictions,lr_model)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment