ocoyawale · January 30, 2018 00:47
diff --git a/explanation1.ipynb b/explanation1.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by importing what we need, and reading in the data. Note that the categorical variables have been encoded. For brevity, I already split the data into train and test sets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The features in the data set are:\n",
    "\n",
    "1. Unique_Orders: Number of unique orders by the customer in the given time period\n",
    "2. Recent_Purchase: Most recent purchase (in dollars)\n",
    "3. Recent_Return: Most recent return (in dollars)\n",
    "4. Total_Purchased: Total lifetime purchase amount\n",
    "5. Total_Returned: Total lifetime return amount\n",
    "6. Recent_Seat: How many tickets/seats they last bought\n",
    "7. Recent_Sub_Price: How much their last subscription cost, if anything\n",
    "8. Total_Seats: Total lifetime seats they've bought\n",
    "9. Total_Paid: Total amount they've paid\n",
    "10. Num_Moves: Number of times they've moved home addresses\n",
    "11. Solicitor_Code: Most recent solicitor (i.e. was it Alice, Bob, the web API, etc)\n",
    "12. Prior_Code: Their priority code\n",
    "13. Country_Code: The code of their home country"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn import svm\n",
    "from matplotlib import pyplot as plt\n",
    "% matplotlib inline\n",
    "from sklearn.metrics import accuracy_score, roc_auc_score, f1_score, precision_score, recall_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "test = pd.read_csv(\"test.csv\")\n",
    "train = pd.read_csv(\"train.csv\")\n",
    "predictors = [\"Unique_Orders\",\"Recent_Purchase\",\"Recent_Return\",\"Total_Purchased\",\n",
    "             \"Total_Returned\",\"Recent_Seat\",\"Recent_Sub_Price\",\"Total_Seats\",\n",
    "             \"Total_Paid\",\"Num_Moves\",\"Solicitor_Code\",\"Prior_Code\", \"Country_Code\"]\n",
    "X_train = train[predictors]\n",
    "y_train = train[\"Churn?\"]\n",
    "X_test = test[predictors]\n",
    "y_test = test[\"Churn?\"]  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next let's define our error metrics. We'll look at AUC (ROC), precision, recall, F1 score, and just for fun, accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def error_metrics(y_test, predictions, model):    \n",
    "    print(\"AUC: \", roc_auc_score(y_test, predictions))\n",
    "    print(\"Precision: \",precision_score(y_test, predictions, average=\"macro\"))\n",
    "    print(\"Recall: \",recall_score(y_test, predictions, average=\"macro\")) \n",
    "    print(\"F1 Score: \",f1_score(y_test, predictions, average=\"macro\"))\n",
    "    print(\"Accuracy: \", model.score(X_test, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll do some simple predictions. Let's choose C = 0.1 for both our SVM and logistic regression models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SVM with C = 0.1.\n",
      "AUC:  0.666454081633\n",
      "Precision:  0.670731707317\n",
      "Recall:  0.666454081633\n",
      "F1 Score:  0.666705002875\n",
      "Accuracy:  0.671428571429\n"
     ]
    }
   ],
   "source": [
    "print(\"SVM with C = 0.1.\")\n",
    "svm_model = svm.SVC(kernel = \"linear\", C=0.1, probability = True).fit(X_train,y_train)  \n",
    "predictions = svm_model.predict(X_test)\n",
    "error_metrics(y_test, predictions,svm_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Logistic Regression with C = 0.1.\n",
      "AUC:  0.668367346939\n",
      "Precision:  0.679487179487\n",
      "Recall:  0.668367346939\n",
      "F1 Score:  0.667473919523\n",
      "Accuracy:  0.67619047619\n"
     ]
    }
   ],
   "source": [
    "print(\"Logistic Regression with C = 0.1.\")\n",
    "lr_model = LogisticRegression(C=0.1).fit(X_train,y_train)  \n",
    "predictions = lr_model.predict(X_test)\n",
    "error_metrics(y_test, predictions,lr_model)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Let's start by importing what we need, and reading in the data. Note that the categorical variables have been encoded. For brevity, I already split the data into train and test sets."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The features in the data set are:\n",
	"\n",
	"1. Unique_Orders: Number of unique orders by the customer in the given time period\n",
	"2. Recent_Purchase: Most recent purchase (in dollars)\n",
	"3. Recent_Return: Most recent return (in dollars)\n",
	"4. Total_Purchased: Total lifetime purchase amount\n",
	"5. Total_Returned: Total lifetime return amount\n",
	"6. Recent_Seat: How many tickets/seats they last bought\n",
	"7. Recent_Sub_Price: How much their last subscription cost, if anything\n",
	"8. Total_Seats: Total lifetime seats they've bought\n",
	"9. Total_Paid: Total amount they've paid\n",
	"10. Num_Moves: Number of times they've moved home addresses\n",
	"11. Solicitor_Code: Most recent solicitor (i.e. was it Alice, Bob, the web API, etc)\n",
	"12. Prior_Code: Their priority code\n",
	"13. Country_Code: The code of their home country"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import pandas as pd\n",
	"import numpy as np\n",
	"from sklearn.linear_model import LogisticRegression\n",
	"from sklearn import svm\n",
	"from matplotlib import pyplot as plt\n",
	"% matplotlib inline\n",
	"from sklearn.metrics import accuracy_score, roc_auc_score, f1_score, precision_score, recall_score"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"test = pd.read_csv(\"test.csv\")\n",
	"train = pd.read_csv(\"train.csv\")\n",
	"predictors = [\"Unique_Orders\",\"Recent_Purchase\",\"Recent_Return\",\"Total_Purchased\",\n",
	" \"Total_Returned\",\"Recent_Seat\",\"Recent_Sub_Price\",\"Total_Seats\",\n",
	" \"Total_Paid\",\"Num_Moves\",\"Solicitor_Code\",\"Prior_Code\", \"Country_Code\"]\n",
	"X_train = train[predictors]\n",
	"y_train = train[\"Churn?\"]\n",
	"X_test = test[predictors]\n",
	"y_test = test[\"Churn?\"] "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Next let's define our error metrics. We'll look at AUC (ROC), precision, recall, F1 score, and just for fun, accuracy."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"def error_metrics(y_test, predictions, model): \n",
	" print(\"AUC: \", roc_auc_score(y_test, predictions))\n",
	" print(\"Precision: \",precision_score(y_test, predictions, average=\"macro\"))\n",
	" print(\"Recall: \",recall_score(y_test, predictions, average=\"macro\")) \n",
	" print(\"F1 Score: \",f1_score(y_test, predictions, average=\"macro\"))\n",
	" print(\"Accuracy: \", model.score(X_test, y_test))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we'll do some simple predictions. Let's choose C = 0.1 for both our SVM and logistic regression models."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"SVM with C = 0.1.\n",
	"AUC: 0.666454081633\n",
	"Precision: 0.670731707317\n",
	"Recall: 0.666454081633\n",
	"F1 Score: 0.666705002875\n",
	"Accuracy: 0.671428571429\n"
	]
	}
	],
	"source": [
	"print(\"SVM with C = 0.1.\")\n",
	"svm_model = svm.SVC(kernel = \"linear\", C=0.1, probability = True).fit(X_train,y_train) \n",
	"predictions = svm_model.predict(X_test)\n",
	"error_metrics(y_test, predictions,svm_model)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Logistic Regression with C = 0.1.\n",
	"AUC: 0.668367346939\n",
	"Precision: 0.679487179487\n",
	"Recall: 0.668367346939\n",
	"F1 Score: 0.667473919523\n",
	"Accuracy: 0.67619047619\n"
	]
	}
	],
	"source": [
	"print(\"Logistic Regression with C = 0.1.\")\n",
	"lr_model = LogisticRegression(C=0.1).fit(X_train,y_train) \n",
	"predictions = lr_model.predict(X_test)\n",
	"error_metrics(y_test, predictions,lr_model)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.0"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}