Skip to content

Instantly share code, notes, and snippets.

@KaroAntonio
Created April 1, 2024 19:39
Show Gist options
  • Save KaroAntonio/8d778634f0ff06f7705ee75fd703971c to your computer and use it in GitHub Desktop.
Save KaroAntonio/8d778634f0ff06f7705ee75fd703971c to your computer and use it in GitHub Desktop.
Relax Challenge.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMxL6n39oJcNvBK3qPZEJxt",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/KaroAntonio/8d778634f0ff06f7705ee75fd703971c/relax-challenge.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Relax Take Home Challenge\n",
"Karo Castro-Wunsch\n",
"2024-03-29\n",
"\n",
"We are working with user and user engagement data for the Relax service. \n",
"Our goal is to: **Identify which factors predict future user adoption.**\n",
"\n",
"Defined 'Adopted User': a user who was logged into the product on three seperate days in at least one seven day period.\n",
"\n",
"In order to answer our key question, we:\n",
"1. Build an 'is_adoped' col for each user\n",
"2. Select / Clean up the features for analysis\n",
"3. Compute importance using 2 methods \n",
" 3.1 Correlation \n",
" 3.2 LogReg Feature Importance\n",
"4. Analyize feature importance"
],
"metadata": {
"id": "G-bFfnQjTASB"
}
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"id": "LJwHQ_H9S0zM"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.model_selection import train_test_split, cross_val_score\n",
"from sklearn.metrics import accuracy_score, confusion_matrix, classification_report"
]
},
{
"cell_type": "code",
"source": [
"user_df = pd.read_csv('takehome_users.csv', encoding='latin-1')\n",
"engagement_df = pd.read_csv('takehome_user_engagement.csv')"
],
"metadata": {
"id": "EL4GPlp-T0i7"
},
"execution_count": 48,
"outputs": []
},
{
"cell_type": "code",
"source": [
"user_df.rename(columns={'object_id':'user_id'}, inplace=True)"
],
"metadata": {
"id": "drne55Ak5Yla"
},
"execution_count": 49,
"outputs": []
},
{
"cell_type": "code",
"source": [
"user_df.set_index('user_id', inplace=True)"
],
"metadata": {
"id": "BmmwtOVC5Kc9"
},
"execution_count": 50,
"outputs": []
},
{
"cell_type": "code",
"source": [
"user_df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 344
},
"id": "MFOYIQf3UCda",
"outputId": "53e16998-fa58-42e5-a787-bb8745fcef1f"
},
"execution_count": 51,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" creation_time name email \\\n",
"user_id \n",
"1 2014-04-22 03:53:30 Clausen August [email protected] \n",
"2 2013-11-15 03:45:04 Poole Matthew [email protected] \n",
"3 2013-03-19 23:14:52 Bottrill Mitchell [email protected] \n",
"4 2013-05-21 08:09:28 Clausen Nicklas [email protected] \n",
"5 2013-01-17 10:14:20 Raw Grace [email protected] \n",
"\n",
" creation_source last_session_creation_time opted_in_to_mailing_list \\\n",
"user_id \n",
"1 GUEST_INVITE 1.398139e+09 1 \n",
"2 ORG_INVITE 1.396238e+09 0 \n",
"3 ORG_INVITE 1.363735e+09 0 \n",
"4 GUEST_INVITE 1.369210e+09 0 \n",
"5 GUEST_INVITE 1.358850e+09 0 \n",
"\n",
" enabled_for_marketing_drip org_id invited_by_user_id \n",
"user_id \n",
"1 0 11 10803.0 \n",
"2 0 1 316.0 \n",
"3 0 94 1525.0 \n",
"4 0 1 5151.0 \n",
"5 0 193 5240.0 "
],
"text/html": [
"\n",
" <div id=\"df-957d9e1f-1b3b-410d-b4cd-1a5886257095\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>creation_time</th>\n",
" <th>name</th>\n",
" <th>email</th>\n",
" <th>creation_source</th>\n",
" <th>last_session_creation_time</th>\n",
" <th>opted_in_to_mailing_list</th>\n",
" <th>enabled_for_marketing_drip</th>\n",
" <th>org_id</th>\n",
" <th>invited_by_user_id</th>\n",
" </tr>\n",
" <tr>\n",
" <th>user_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014-04-22 03:53:30</td>\n",
" <td>Clausen August</td>\n",
" <td>[email protected]</td>\n",
" <td>GUEST_INVITE</td>\n",
" <td>1.398139e+09</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>10803.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2013-11-15 03:45:04</td>\n",
" <td>Poole Matthew</td>\n",
" <td>[email protected]</td>\n",
" <td>ORG_INVITE</td>\n",
" <td>1.396238e+09</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>316.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2013-03-19 23:14:52</td>\n",
" <td>Bottrill Mitchell</td>\n",
" <td>[email protected]</td>\n",
" <td>ORG_INVITE</td>\n",
" <td>1.363735e+09</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>94</td>\n",
" <td>1525.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2013-05-21 08:09:28</td>\n",
" <td>Clausen Nicklas</td>\n",
" <td>[email protected]</td>\n",
" <td>GUEST_INVITE</td>\n",
" <td>1.369210e+09</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>5151.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2013-01-17 10:14:20</td>\n",
" <td>Raw Grace</td>\n",
" <td>[email protected]</td>\n",
" <td>GUEST_INVITE</td>\n",
" <td>1.358850e+09</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>193</td>\n",
" <td>5240.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-957d9e1f-1b3b-410d-b4cd-1a5886257095')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-957d9e1f-1b3b-410d-b4cd-1a5886257095 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-957d9e1f-1b3b-410d-b4cd-1a5886257095');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-1b4f1be9-ba7b-4f46-9a63-283637f771b2\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-1b4f1be9-ba7b-4f46-9a63-283637f771b2')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-1b4f1be9-ba7b-4f46-9a63-283637f771b2 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "user_df",
"summary": "{\n \"name\": \"user_df\",\n \"rows\": 12000,\n \"fields\": [\n {\n \"column\": \"user_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3464,\n \"min\": 1,\n \"max\": 12000,\n \"num_unique_values\": 12000,\n \"samples\": [\n 1936,\n 6495,\n 1721\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"creation_time\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 11996,\n \"samples\": [\n \"2013-02-12 13:23:43\",\n \"2013-02-16 01:32:28\",\n \"2013-06-02 16:34:12\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 11355,\n \"samples\": [\n \"Christiansen Bent\",\n \"Higley Christopher\",\n \"Train Aidan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"email\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 11980,\n \"samples\": [\n \"[email protected]\",\n \"[email protected]\",\n \"[email protected]\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"creation_source\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"ORG_INVITE\",\n \"SIGNUP_GOOGLE_AUTH\",\n \"SIGNUP\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"last_session_creation_time\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19531160.787043978,\n \"min\": 1338452406.0,\n \"max\": 1402066730.0,\n \"num_unique_values\": 8821,\n \"samples\": [\n 1340000786.0,\n 1379320190.0,\n 1401502899.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"opted_in_to_mailing_list\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"enabled_for_marketing_drip\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"org_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 124,\n \"min\": 0,\n \"max\": 416,\n \"num_unique_values\": 417,\n \"samples\": [\n 198,\n 407\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"invited_by_user_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3383.7619678015885,\n \"min\": 3.0,\n \"max\": 11999.0,\n \"num_unique_values\": 2564,\n \"samples\": [\n 1877.0,\n 10730.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 51
}
]
},
{
"cell_type": "code",
"source": [
"user_df.info()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "rDkZ3G5iUGPa",
"outputId": "2124f1c9-5a8e-42c8-9f6d-85895c5c7882"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 12000 entries, 0 to 11999\n",
"Data columns (total 10 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 object_id 12000 non-null int64 \n",
" 1 creation_time 12000 non-null object \n",
" 2 name 12000 non-null object \n",
" 3 email 12000 non-null object \n",
" 4 creation_source 12000 non-null object \n",
" 5 last_session_creation_time 8823 non-null float64\n",
" 6 opted_in_to_mailing_list 12000 non-null int64 \n",
" 7 enabled_for_marketing_drip 12000 non-null int64 \n",
" 8 org_id 12000 non-null int64 \n",
" 9 invited_by_user_id 6417 non-null float64\n",
"dtypes: float64(2), int64(4), object(4)\n",
"memory usage: 937.6+ KB\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"engagement_df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "ySG9rmvqUI8u",
"outputId": "3347352e-cb9f-405e-bd4c-64b78e5aed69"
},
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" time_stamp user_id visited\n",
"0 2014-04-22 03:53:30 1 1\n",
"1 2013-11-15 03:45:04 2 1\n",
"2 2013-11-29 03:45:04 2 1\n",
"3 2013-12-09 03:45:04 2 1\n",
"4 2013-12-25 03:45:04 2 1"
],
"text/html": [
"\n",
" <div id=\"df-ce40bdef-a6a1-4eb1-b6ca-6470af84b655\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>time_stamp</th>\n",
" <th>user_id</th>\n",
" <th>visited</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014-04-22 03:53:30</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2013-11-15 03:45:04</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2013-11-29 03:45:04</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2013-12-09 03:45:04</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2013-12-25 03:45:04</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ce40bdef-a6a1-4eb1-b6ca-6470af84b655')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-ce40bdef-a6a1-4eb1-b6ca-6470af84b655 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-ce40bdef-a6a1-4eb1-b6ca-6470af84b655');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-1c19f64b-267e-4e3c-889d-121c39d7642b\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-1c19f64b-267e-4e3c-889d-121c39d7642b')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-1c19f64b-267e-4e3c-889d-121c39d7642b button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "engagement_df"
}
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"source": [
"engagement_df.info()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZsG-djqrULIF",
"outputId": "48f74004-7734-4d36-fd76-2dcb4d1ac9aa"
},
"execution_count": 135,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 207917 entries, 0 to 207916\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 time_stamp 207917 non-null object \n",
" 1 user_id 207917 non-null int64 \n",
" 2 visited 207917 non-null int64 \n",
" 3 day_visited 207917 non-null datetime64[ns]\n",
"dtypes: datetime64[ns](1), int64(2), object(1)\n",
"memory usage: 6.3+ MB\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Build is_adopted Feature\n"
],
"metadata": {
"id": "10rrvWkt680C"
}
},
{
"cell_type": "code",
"source": [
"# Construct the adopted_user target variable\n",
"# Construct a dt for each day a user visited\n",
"day_visited = pd.to_datetime(engagement_df.time_stamp.apply(lambda t_str: t_str[0:10]))\n",
"engagement_df['day_visited'] = day_visited\n",
"\n",
"def is_adopted_user(days_active):\n",
" days = np.sort(days_active.unique())\n",
" if len(days) <= 2:\n",
" return 0\n",
" for day_i in range(1, len(days)-2):\n",
" if days[day_i+1] - days[day_i-1] < pd.Timedelta('7 days'):\n",
" return 1\n",
" return 0\n",
"\n",
"adopted_users_sr = engagement_df.groupby('user_id')['day_visited'].transform(is_adopted_user)"
],
"metadata": {
"id": "8UMWqclxU2C3"
},
"execution_count": 136,
"outputs": []
},
{
"cell_type": "code",
"source": [
"adopted_users_df = pd.DataFrame({'is_adopted': adopted_users_sr, 'user_id': engagement_df.user_id })"
],
"metadata": {
"id": "gYS_7j7D231O"
},
"execution_count": 137,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Let's Double check that our transform f worked as expected\n",
"# We should get only 0s and 1s for the mean across user rows,\n",
"# which would indicate that all rows are labelled consistently\n",
"# This shows also indicates that we have a 1555 / 12000 adoption rate for users\n",
"adopted_users_df.groupby('user_id').mean().value_counts()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fXFLdt3J4ADg",
"outputId": "ae793481-7ef3-493a-8562-05471a86e274"
},
"execution_count": 138,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"is_adopted\n",
"0.0 7268\n",
"1.0 1555\n",
"dtype: int64"
]
},
"metadata": {},
"execution_count": 138
}
]
},
{
"cell_type": "code",
"source": [
"# Add is_adopted value to user df\n",
"is_adopted_sr = adopted_users_df.groupby('user_id').mean().astype(int)\n",
"user_df['is_adopted'] = is_adopted_sr"
],
"metadata": {
"id": "12t2GDuM4wpB"
},
"execution_count": 139,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Clean up Features and choose those appropriate for Analysis"
],
"metadata": {
"id": "bnjTj3IV7BN1"
}
},
{
"cell_type": "code",
"source": [
"# Select Cols relevant cols\n",
"# name and email are unique to each row, so not useful for us\n",
"analysis_cols = ['creation_time', 'creation_source','last_session_creation_time', 'opted_in_to_mailing_list', 'enabled_for_marketing_drip', 'org_id', 'is_adopted']\n",
"analysis_df = user_df[analysis_cols]"
],
"metadata": {
"id": "vFertvH-5wKT"
},
"execution_count": 140,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# These both look like cols that could have NA filled with 0\n",
"analysis_df.isna().sum()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "N_TtnycxGe_J",
"outputId": "29c72e54-78aa-4352-ddd0-573fae183455"
},
"execution_count": 141,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"creation_time 0\n",
"creation_source 0\n",
"last_session_creation_time 3177\n",
"opted_in_to_mailing_list 0\n",
"enabled_for_marketing_drip 0\n",
"org_id 0\n",
"is_adopted 3177\n",
"dtype: int64"
]
},
"metadata": {},
"execution_count": 141
}
]
},
{
"cell_type": "code",
"source": [
"analysis_df.fillna(0, inplace=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "kgwpqqVEHIgU",
"outputId": "abc3a310-6c75-4791-82a2-57c8b1f7a1c9"
},
"execution_count": 142,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"<ipython-input-142-0b452bc9e5b1>:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" analysis_df.fillna(0, inplace=True)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# Convert creation_time to days to treat it as a simple int\n",
"analysis_df['creation_time'] = pd.to_datetime(analysis_df['creation_time'])\n",
"analysis_df['creation_time_days'] = ((analysis_df['creation_time'] - pd.Timestamp('1970-01-01')) / pd.Timedelta(days=1)).astype(int)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "f8rFE_qVBomL",
"outputId": "4c894d2b-bbc7-41da-8b8f-dde0a12de415"
},
"execution_count": 143,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"<ipython-input-143-131596bffcb4>:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" analysis_df['creation_time'] = pd.to_datetime(analysis_df['creation_time'])\n",
"<ipython-input-143-131596bffcb4>:3: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" analysis_df['creation_time_days'] = ((analysis_df['creation_time'] - pd.Timestamp('1970-01-01')) / pd.Timedelta(days=1)).astype(int)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# We can convert creation source to dummy bc we have a low amount of categories\n",
"analysis_df.creation_source.value_counts()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "r6XN3B1Y7vs2",
"outputId": "a1ed9e97-02a9-4913-c69c-74792ca33582"
},
"execution_count": 144,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"ORG_INVITE 4254\n",
"GUEST_INVITE 2163\n",
"PERSONAL_PROJECTS 2111\n",
"SIGNUP 2087\n",
"SIGNUP_GOOGLE_AUTH 1385\n",
"Name: creation_source, dtype: int64"
]
},
"metadata": {},
"execution_count": 144
}
]
},
{
"cell_type": "code",
"source": [
"# opted_in_to_mailing_list is binary, it's already well formatted\n",
"analysis_df.opted_in_to_mailing_list.value_counts()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "GP396J6o75M-",
"outputId": "f9877cb4-49a3-4697-abf7-86cdbc2e7480"
},
"execution_count": 145,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 9006\n",
"1 2994\n",
"Name: opted_in_to_mailing_list, dtype: int64"
]
},
"metadata": {},
"execution_count": 145
}
]
},
{
"cell_type": "code",
"source": [
"# org_id has many unique values, these categories are not ordinal\n",
"# We have some options for encoding info from this col\n",
"# 1. We could replace the org id with the value count for each org\n",
"# 2. we could use Xfold target mean encoding to have a useful value for the category\n",
"\n",
"# 1. Add org counts\n",
"org_id_counts_df = analysis_df.org_id.value_counts().to_frame().rename(columns={'org_id':'org_count'})\n",
"analysis_df = analysis_df.join(org_id_counts_df, on='org_id')"
],
"metadata": {
"id": "SwvxUiGU8BP_"
},
"execution_count": 146,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Convert Dummy vars\n",
"creation_source_dummies = pd.get_dummies(analysis_df['creation_source'], prefix='creation_source')\n",
"analysis_with_dummies_df = pd.concat([analysis_df, creation_source_dummies], axis=1)"
],
"metadata": {
"id": "lZNmp2DV-Mog"
},
"execution_count": 147,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Drop Categorical Vars\n",
"analysis_with_dummies_df.drop(['creation_source', 'org_id', 'creation_time'], axis=1, inplace=True)"
],
"metadata": {
"id": "-opk1CZ1-nAI"
},
"execution_count": 148,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Map Feature correlations\n",
"\n",
"correlation_matrix = analysis_with_dummies_df.corr()\n",
"\n",
"plt.figure(figsize=(10, 8))\n",
"sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=\".2f\", linewidths=0.5)\n",
"plt.title('Feature Correlations')\n",
"plt.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 966
},
"id": "WUQaa2Rx-wiw",
"outputId": "28c4e0fc-2f45-483c-fedc-f9315dd05e47"
},
"execution_count": 149,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x800 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"# Here we have our correlations to our is_adopted target var\n",
"is_adopted_corr_sr = correlation_matrix['is_adopted'].drop('is_adopted')"
],
"metadata": {
"id": "qLR0xzzVAUtR"
},
"execution_count": 172,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Let's split our data into X and y\n",
"X = analysis_with_dummies_df.drop('is_adopted', axis=1)\n",
"y = analysis_with_dummies_df['is_adopted']\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
],
"metadata": {
"id": "BloFDZAIBEMz"
},
"execution_count": 151,
"outputs": []
},
{
"cell_type": "code",
"source": [
"scaler = StandardScaler()\n",
"X_train = scaler.fit_transform(X_train)\n",
"X_test = scaler.transform(X_test)\n",
"\n",
"class_weights = {0: 1, 1: 7}\n",
"model = LogisticRegression(class_weight=class_weights)\n",
"model.fit(X_train, y_train)\n",
"\n",
"\n",
"coefficients = model.coef_[0]\n",
"\n",
"c_report = classification_report(y_test, model.predict(X_test))\n",
"print(c_report)\n",
"\n",
"feature_importance_logreg = pd.DataFrame({'Feature': X.columns, 'Importance': np.abs(coefficients)})\n",
"feature_importance_logreg = feature_importance_logreg.sort_values('Importance', ascending=True)\n",
"feature_importance_logreg.plot(x='Feature', y='Importance', kind='barh', figsize=(10, 6))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 696
},
"id": "PMAR0ectBee9",
"outputId": "b4fddc08-c224-4887-9169-eb2fa57249f7"
},
"execution_count": 161,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" precision recall f1-score support\n",
"\n",
" 0.0 0.99 0.97 0.98 2085\n",
" 1.0 0.84 0.92 0.87 315\n",
"\n",
" accuracy 0.97 2400\n",
" macro avg 0.91 0.95 0.93 2400\n",
"weighted avg 0.97 0.97 0.97 2400\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<Axes: ylabel='Feature'>"
]
},
"metadata": {},
"execution_count": 161
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"feature_importance_logreg.set_index('Feature', inplace=True)\n",
"importance_df = pd.DataFrame({'feature_importance_logreg': feature_importance_logreg.Importance, 'feature_corr': is_adopted_corr_sr.abs()})\n",
"importance_df = importance_df.sort_values('feature_importance_logreg', ascending=False)\n",
"importance_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 363
},
"id": "-ov8_dtWIrAY",
"outputId": "14e7d5a5-03c9-41aa-8987-501b89d4b9f7"
},
"execution_count": 185,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" feature_importance_logreg feature_corr\n",
"last_session_creation_time 48.176202 0.242183\n",
"creation_time_days 1.309577 0.089938\n",
"org_count 0.245844 0.081211\n",
"creation_source_GUEST_INVITE 0.089509 0.044354\n",
"creation_source_SIGNUP 0.075115 0.010840\n",
"creation_source_PERSONAL_PROJECTS 0.058168 0.075955\n",
"creation_source_SIGNUP_GOOGLE_AUTH 0.051316 0.033025\n",
"enabled_for_marketing_drip 0.025973 0.002636\n",
"creation_source_ORG_INVITE 0.024211 0.005834\n",
"opted_in_to_mailing_list 0.010362 0.008044"
],
"text/html": [
"\n",
" <div id=\"df-c0744e2c-c6b6-4f5d-80a9-ffdff8fca4a6\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>feature_importance_logreg</th>\n",
" <th>feature_corr</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>last_session_creation_time</th>\n",
" <td>48.176202</td>\n",
" <td>0.242183</td>\n",
" </tr>\n",
" <tr>\n",
" <th>creation_time_days</th>\n",
" <td>1.309577</td>\n",
" <td>0.089938</td>\n",
" </tr>\n",
" <tr>\n",
" <th>org_count</th>\n",
" <td>0.245844</td>\n",
" <td>0.081211</td>\n",
" </tr>\n",
" <tr>\n",
" <th>creation_source_GUEST_INVITE</th>\n",
" <td>0.089509</td>\n",
" <td>0.044354</td>\n",
" </tr>\n",
" <tr>\n",
" <th>creation_source_SIGNUP</th>\n",
" <td>0.075115</td>\n",
" <td>0.010840</td>\n",
" </tr>\n",
" <tr>\n",
" <th>creation_source_PERSONAL_PROJECTS</th>\n",
" <td>0.058168</td>\n",
" <td>0.075955</td>\n",
" </tr>\n",
" <tr>\n",
" <th>creation_source_SIGNUP_GOOGLE_AUTH</th>\n",
" <td>0.051316</td>\n",
" <td>0.033025</td>\n",
" </tr>\n",
" <tr>\n",
" <th>enabled_for_marketing_drip</th>\n",
" <td>0.025973</td>\n",
" <td>0.002636</td>\n",
" </tr>\n",
" <tr>\n",
" <th>creation_source_ORG_INVITE</th>\n",
" <td>0.024211</td>\n",
" <td>0.005834</td>\n",
" </tr>\n",
" <tr>\n",
" <th>opted_in_to_mailing_list</th>\n",
" <td>0.010362</td>\n",
" <td>0.008044</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c0744e2c-c6b6-4f5d-80a9-ffdff8fca4a6')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-c0744e2c-c6b6-4f5d-80a9-ffdff8fca4a6 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-c0744e2c-c6b6-4f5d-80a9-ffdff8fca4a6');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-6457cbbb-e3a5-4001-b036-c8a8e7720f46\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-6457cbbb-e3a5-4001-b036-c8a8e7720f46')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-6457cbbb-e3a5-4001-b036-c8a8e7720f46 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" <div id=\"id_755836dc-9c30-4d7f-a50d-e8689de4c57e\">\n",
" <style>\n",
" .colab-df-generate {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-generate:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-generate:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
" <button class=\"colab-df-generate\" onclick=\"generateWithVariable('importance_df')\"\n",
" title=\"Generate code using this dataframe.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
" </svg>\n",
" </button>\n",
" <script>\n",
" (() => {\n",
" const buttonEl =\n",
" document.querySelector('#id_755836dc-9c30-4d7f-a50d-e8689de4c57e button.colab-df-generate');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" buttonEl.onclick = () => {\n",
" google.colab.notebook.generateWithVariable('importance_df');\n",
" }\n",
" })();\n",
" </script>\n",
" </div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "importance_df",
"summary": "{\n \"name\": \"importance_df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"feature_importance_logreg\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 15.173367551928736,\n \"min\": 0.010361971747437084,\n \"max\": 48.1762024066915,\n \"num_unique_values\": 10,\n \"samples\": [\n 0.024210601036923615,\n 1.309576578838126,\n 0.0581684853645829\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"feature_corr\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.07234266292953191,\n \"min\": 0.0026362073740396137,\n \"max\": 0.2421834455377041,\n \"num_unique_values\": 10,\n \"samples\": [\n 0.005834186929141455,\n 0.08993837292257953,\n 0.07595482239542438\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 185
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Analyze Feature Importance\n",
"Based on the correlation values and feature importance from the log reg model coefficients, we have a reasonable ranking of the relative importance of the features. Both metrics agree on the importance order of most of the features. The LogReg model finds last_session_creation_time to be the most useful feature by far, and given that we're able to accomplish >0.84 metrics across the board for that model, we feel relatively confident in that feature in fact being quite important. Further work could explore the relevance of each feature to more complex and possibly more performant models.\n",
"\n",
"#### Further Work\n",
"We could futher investigate the importance of the features using methods such as:\n",
"1. Permutation Evaluation\n",
"2. PCA or component coefficients\n",
"3. model coefficients from other trained ML models such as RandomForest"
],
"metadata": {
"id": "cQnYp_biJseU"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment