Skip to content

Instantly share code, notes, and snippets.

@alonsosilvaallende
Last active October 16, 2023 08:20
Show Gist options
  • Save alonsosilvaallende/684ed7326e5c1192ad339fc8e59e04f9 to your computer and use it in GitHub Desktop.
Save alonsosilvaallende/684ed7326e5c1192ad339fc8e59e04f9 to your computer and use it in GitHub Desktop.
Cox_PH_and_RSF-colab.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Cox_PH_and_RSF-colab.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/alonsosilvaallende/684ed7326e5c1192ad339fc8e59e04f9/cox_ph_and_rsf-colab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UN3PoUTSb2nT"
},
"source": [
"The objective of this post is to compare two models (Cox proportional hazards model and Random survival forest model) to estimate the survival probability given a set of features/covariables.\n",
"\n",
">[\"Experimental Comparison of Semi-parametric, Parametric, and Machine Learning Models for Time-to-Event Analysis Through the Concordance Index,\"](https://arxiv.org/abs/2003.08820)\n",
"Camila Fernandez, Chung Shue Chen, Pierre Gaillard, Alonso Silva\n",
"\n",
"To perform this analysis we will use [scikit-learn](https://scikit-learn.org/) and [scikit-survival](https://pypi.org/project/scikit-survival/). Finally, we will use [eli5](https://eli5.readthedocs.io/en/latest/index.html) to study feature importances (computed with permutation importance)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "pzqGFx_Bb_6A"
},
"source": [
"!pip install -q scikit-survival"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VoQVEI5p_rga"
},
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "SnT6e_JPb2ns"
},
"source": [
"We first download a dataset from scikit-survival."
]
},
{
"cell_type": "code",
"metadata": {
"id": "D0xxNWzI-N3j"
},
"source": [
"from sksurv.datasets import load_gbsg2\n",
"\n",
"X, y = load_gbsg2()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "IBTo4q_Hb2n0"
},
"source": [
"## An example: German Breast Cancer Study Group 2 (gbcsg)\n",
"\n",
"This dataset contains the following 8 features/covariables:\n",
"\n",
"- age: age (in years),\n",
"- estrec: estrogen receptor (in fmol),\n",
"- horTh: hormonal therapy (yes or no),\n",
"- menostat: menopausal status (premenopausal or postmenopausal),\n",
"- pnodes: number of positive nodes,\n",
"- progrec: progesterone receptor (in fmol),\n",
"- tgrade: tumor grade (I < II < III),\n",
"- tsize: tumor size (in mm).\n",
"\n",
"and the two outputs:\n",
"\n",
"- recurrence free time (in days),\n",
"- censoring indicator (0 - censored, 1 - event).\n",
"\n",
"The dataset has 686 samples and 8 features/covariables.\n",
"\n",
"\n",
"**References**\n",
"\n",
"M. Schumacher, G. Basert, H. Bojar, K. Huebner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R.L.A. Neumann and H.F. Rauschecker for the German Breast Cancer Study Group (1994), [Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients](https://www.ncbi.nlm.nih.gov/pubmed/7931478). Journal of Clinical Oncology, 12, 2086–2093."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kAsZ72YYb2n3"
},
"source": [
"Let's take a look at the features/covariates."
]
},
{
"cell_type": "code",
"metadata": {
"id": "_OlmlI6g-X43",
"outputId": "f3b97327-9b8a-491b-e36a-00887b6b2f9d",
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/",
"height": 363
}
},
"source": [
"cols = [\"age\", \"estrec\", \"pnodes\", \"progrec\", \"tsize\"]\n",
"formatdict = {}\n",
"for col in cols: formatdict[col] = \"{:,.0f}\"\n",
"X.head(10).style.hide(axis=\"index\").format(formatdict)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<pandas.io.formats.style.Styler at 0x7f6efea2a9b0>"
],
"text/html": [
"<style type=\"text/css\">\n",
"</style>\n",
"<table id=\"T_b723e\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th id=\"T_b723e_level0_col0\" class=\"col_heading level0 col0\" >age</th>\n",
" <th id=\"T_b723e_level0_col1\" class=\"col_heading level0 col1\" >estrec</th>\n",
" <th id=\"T_b723e_level0_col2\" class=\"col_heading level0 col2\" >horTh</th>\n",
" <th id=\"T_b723e_level0_col3\" class=\"col_heading level0 col3\" >menostat</th>\n",
" <th id=\"T_b723e_level0_col4\" class=\"col_heading level0 col4\" >pnodes</th>\n",
" <th id=\"T_b723e_level0_col5\" class=\"col_heading level0 col5\" >progrec</th>\n",
" <th id=\"T_b723e_level0_col6\" class=\"col_heading level0 col6\" >tgrade</th>\n",
" <th id=\"T_b723e_level0_col7\" class=\"col_heading level0 col7\" >tsize</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td id=\"T_b723e_row0_col0\" class=\"data row0 col0\" >70</td>\n",
" <td id=\"T_b723e_row0_col1\" class=\"data row0 col1\" >66</td>\n",
" <td id=\"T_b723e_row0_col2\" class=\"data row0 col2\" >no</td>\n",
" <td id=\"T_b723e_row0_col3\" class=\"data row0 col3\" >Post</td>\n",
" <td id=\"T_b723e_row0_col4\" class=\"data row0 col4\" >3</td>\n",
" <td id=\"T_b723e_row0_col5\" class=\"data row0 col5\" >48</td>\n",
" <td id=\"T_b723e_row0_col6\" class=\"data row0 col6\" >II</td>\n",
" <td id=\"T_b723e_row0_col7\" class=\"data row0 col7\" >21</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row1_col0\" class=\"data row1 col0\" >56</td>\n",
" <td id=\"T_b723e_row1_col1\" class=\"data row1 col1\" >77</td>\n",
" <td id=\"T_b723e_row1_col2\" class=\"data row1 col2\" >yes</td>\n",
" <td id=\"T_b723e_row1_col3\" class=\"data row1 col3\" >Post</td>\n",
" <td id=\"T_b723e_row1_col4\" class=\"data row1 col4\" >7</td>\n",
" <td id=\"T_b723e_row1_col5\" class=\"data row1 col5\" >61</td>\n",
" <td id=\"T_b723e_row1_col6\" class=\"data row1 col6\" >II</td>\n",
" <td id=\"T_b723e_row1_col7\" class=\"data row1 col7\" >12</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row2_col0\" class=\"data row2 col0\" >58</td>\n",
" <td id=\"T_b723e_row2_col1\" class=\"data row2 col1\" >271</td>\n",
" <td id=\"T_b723e_row2_col2\" class=\"data row2 col2\" >yes</td>\n",
" <td id=\"T_b723e_row2_col3\" class=\"data row2 col3\" >Post</td>\n",
" <td id=\"T_b723e_row2_col4\" class=\"data row2 col4\" >9</td>\n",
" <td id=\"T_b723e_row2_col5\" class=\"data row2 col5\" >52</td>\n",
" <td id=\"T_b723e_row2_col6\" class=\"data row2 col6\" >II</td>\n",
" <td id=\"T_b723e_row2_col7\" class=\"data row2 col7\" >35</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row3_col0\" class=\"data row3 col0\" >59</td>\n",
" <td id=\"T_b723e_row3_col1\" class=\"data row3 col1\" >29</td>\n",
" <td id=\"T_b723e_row3_col2\" class=\"data row3 col2\" >yes</td>\n",
" <td id=\"T_b723e_row3_col3\" class=\"data row3 col3\" >Post</td>\n",
" <td id=\"T_b723e_row3_col4\" class=\"data row3 col4\" >4</td>\n",
" <td id=\"T_b723e_row3_col5\" class=\"data row3 col5\" >60</td>\n",
" <td id=\"T_b723e_row3_col6\" class=\"data row3 col6\" >II</td>\n",
" <td id=\"T_b723e_row3_col7\" class=\"data row3 col7\" >17</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row4_col0\" class=\"data row4 col0\" >73</td>\n",
" <td id=\"T_b723e_row4_col1\" class=\"data row4 col1\" >65</td>\n",
" <td id=\"T_b723e_row4_col2\" class=\"data row4 col2\" >no</td>\n",
" <td id=\"T_b723e_row4_col3\" class=\"data row4 col3\" >Post</td>\n",
" <td id=\"T_b723e_row4_col4\" class=\"data row4 col4\" >1</td>\n",
" <td id=\"T_b723e_row4_col5\" class=\"data row4 col5\" >26</td>\n",
" <td id=\"T_b723e_row4_col6\" class=\"data row4 col6\" >II</td>\n",
" <td id=\"T_b723e_row4_col7\" class=\"data row4 col7\" >35</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row5_col0\" class=\"data row5 col0\" >32</td>\n",
" <td id=\"T_b723e_row5_col1\" class=\"data row5 col1\" >13</td>\n",
" <td id=\"T_b723e_row5_col2\" class=\"data row5 col2\" >no</td>\n",
" <td id=\"T_b723e_row5_col3\" class=\"data row5 col3\" >Pre</td>\n",
" <td id=\"T_b723e_row5_col4\" class=\"data row5 col4\" >24</td>\n",
" <td id=\"T_b723e_row5_col5\" class=\"data row5 col5\" >0</td>\n",
" <td id=\"T_b723e_row5_col6\" class=\"data row5 col6\" >III</td>\n",
" <td id=\"T_b723e_row5_col7\" class=\"data row5 col7\" >57</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row6_col0\" class=\"data row6 col0\" >59</td>\n",
" <td id=\"T_b723e_row6_col1\" class=\"data row6 col1\" >0</td>\n",
" <td id=\"T_b723e_row6_col2\" class=\"data row6 col2\" >yes</td>\n",
" <td id=\"T_b723e_row6_col3\" class=\"data row6 col3\" >Post</td>\n",
" <td id=\"T_b723e_row6_col4\" class=\"data row6 col4\" >2</td>\n",
" <td id=\"T_b723e_row6_col5\" class=\"data row6 col5\" >181</td>\n",
" <td id=\"T_b723e_row6_col6\" class=\"data row6 col6\" >II</td>\n",
" <td id=\"T_b723e_row6_col7\" class=\"data row6 col7\" >8</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row7_col0\" class=\"data row7 col0\" >65</td>\n",
" <td id=\"T_b723e_row7_col1\" class=\"data row7 col1\" >25</td>\n",
" <td id=\"T_b723e_row7_col2\" class=\"data row7 col2\" >no</td>\n",
" <td id=\"T_b723e_row7_col3\" class=\"data row7 col3\" >Post</td>\n",
" <td id=\"T_b723e_row7_col4\" class=\"data row7 col4\" >1</td>\n",
" <td id=\"T_b723e_row7_col5\" class=\"data row7 col5\" >192</td>\n",
" <td id=\"T_b723e_row7_col6\" class=\"data row7 col6\" >II</td>\n",
" <td id=\"T_b723e_row7_col7\" class=\"data row7 col7\" >16</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row8_col0\" class=\"data row8 col0\" >80</td>\n",
" <td id=\"T_b723e_row8_col1\" class=\"data row8 col1\" >59</td>\n",
" <td id=\"T_b723e_row8_col2\" class=\"data row8 col2\" >no</td>\n",
" <td id=\"T_b723e_row8_col3\" class=\"data row8 col3\" >Post</td>\n",
" <td id=\"T_b723e_row8_col4\" class=\"data row8 col4\" >30</td>\n",
" <td id=\"T_b723e_row8_col5\" class=\"data row8 col5\" >0</td>\n",
" <td id=\"T_b723e_row8_col6\" class=\"data row8 col6\" >II</td>\n",
" <td id=\"T_b723e_row8_col7\" class=\"data row8 col7\" >39</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_b723e_row9_col0\" class=\"data row9 col0\" >66</td>\n",
" <td id=\"T_b723e_row9_col1\" class=\"data row9 col1\" >3</td>\n",
" <td id=\"T_b723e_row9_col2\" class=\"data row9 col2\" >no</td>\n",
" <td id=\"T_b723e_row9_col3\" class=\"data row9 col3\" >Post</td>\n",
" <td id=\"T_b723e_row9_col4\" class=\"data row9 col4\" >7</td>\n",
" <td id=\"T_b723e_row9_col5\" class=\"data row9 col5\" >0</td>\n",
" <td id=\"T_b723e_row9_col6\" class=\"data row9 col6\" >II</td>\n",
" <td id=\"T_b723e_row9_col7\" class=\"data row9 col7\" >18</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
]
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zcMjHeAEb2n_"
},
"source": [
"Let's take a look at the output."
]
},
{
"cell_type": "code",
"metadata": {
"id": "h8ltRTa4_WOn",
"outputId": "9e762f57-bcd9-4961-f2d0-4a445ad69d85",
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"y[:10]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([( True, 1814.), ( True, 2018.), ( True, 712.), ( True, 1807.),\n",
" ( True, 772.), ( True, 448.), (False, 2172.), (False, 2161.),\n",
" ( True, 471.), (False, 2014.)],\n",
" dtype=[('cens', '?'), ('time', '<f8')])"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A5NXlNYob2oH"
},
"source": [
"For the output, scikit-survival uses a numpy nd array, so to show it we do a dataframe."
]
},
{
"cell_type": "code",
"source": [
"df_y = pd.DataFrame(data={'time': y['time'].astype(int), 'event': y['cens']})\n",
"df_y[:10].style.hide_index().highlight_min('event', color='lightgreen')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 363
},
"id": "AfPvZcjJ-GwQ",
"outputId": "b8561a10-5167-4901-b454-0c7e6be0136e"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_63853_row6_col1, #T_63853_row7_col1, #T_63853_row9_col1 {\n",
" background-color: lightgreen;\n",
"}\n",
"</style>\n",
"<table id=\"T_63853_\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"col_heading level0 col0\" >time</th>\n",
" <th class=\"col_heading level0 col1\" >event</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td id=\"T_63853_row0_col0\" class=\"data row0 col0\" >1814</td>\n",
" <td id=\"T_63853_row0_col1\" class=\"data row0 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row1_col0\" class=\"data row1 col0\" >2018</td>\n",
" <td id=\"T_63853_row1_col1\" class=\"data row1 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row2_col0\" class=\"data row2 col0\" >712</td>\n",
" <td id=\"T_63853_row2_col1\" class=\"data row2 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row3_col0\" class=\"data row3 col0\" >1807</td>\n",
" <td id=\"T_63853_row3_col1\" class=\"data row3 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row4_col0\" class=\"data row4 col0\" >772</td>\n",
" <td id=\"T_63853_row4_col1\" class=\"data row4 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row5_col0\" class=\"data row5 col0\" >448</td>\n",
" <td id=\"T_63853_row5_col1\" class=\"data row5 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row6_col0\" class=\"data row6 col0\" >2172</td>\n",
" <td id=\"T_63853_row6_col1\" class=\"data row6 col1\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row7_col0\" class=\"data row7 col0\" >2161</td>\n",
" <td id=\"T_63853_row7_col1\" class=\"data row7 col1\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row8_col0\" class=\"data row8 col0\" >471</td>\n",
" <td id=\"T_63853_row8_col1\" class=\"data row8 col1\" >True</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_63853_row9_col0\" class=\"data row9 col0\" >2014</td>\n",
" <td id=\"T_63853_row9_col1\" class=\"data row9 col1\" >False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x7f8de02b0510>"
]
},
"metadata": {},
"execution_count": 9
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xzW5x7ljb2oP"
},
"source": [
"One of the main challenges of survival analysis is **right censoring**, i.e., by the end of the study, the event of interest (for example, in medicine 'death of a patient' or in this dataset 'recurrence of cancer') has only occurred for a subset of the observations.\n",
"\n",
"The **right censoring** in this dataset is given by the column named 'event' and it's a variable which can take value 'True' if the patient had a recurrence of cancer or 'False' if the patient is recurrence free at the indicated time (right-censored samples)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VmfAR7igb2oW"
},
"source": [
"Let's see how many right-censored samples do we have."
]
},
{
"cell_type": "code",
"metadata": {
"id": "rzS8h1GG_o_A",
"outputId": "3fc3c587-1978-45c5-abaa-6eed2bd5af6a",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"print(f'Number of samples: {len(df_y)}')\n",
"print(f'Number of right censored samples: {len(df_y.query(\"event == False\"))}')\n",
"print(f'Percentage of right censored samples: {100*len(df_y.query(\"event == False\"))/len(df_y):.1f}%')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Number of samples: 686\n",
"Number of right censored samples: 387\n",
"Percentage of right censored samples: 56.4%\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VtsENFsnQhZx"
},
"source": [
"There are 387 patients (56.4%) who were right censored (recurrence free) at the end of the study.\n",
"\n",
"Let's divide our dataset in training and test sets."
]
},
{
"cell_type": "code",
"metadata": {
"id": "duYhddUr_1nH",
"outputId": "17d1a3fb-2793-446e-cdd6-3a866d24fbc3",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_trn, X_test, y_trn, y_test = train_test_split(X, y, random_state=42)\n",
"\n",
"print(f'Number of training samples: {len(y_trn)}')\n",
"print(f'Number of test samples: {len(y_test)}')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Number of training samples: 514\n",
"Number of test samples: 172\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3VEOV-vWb2ow"
},
"source": [
"We divide the features/covariates into continuous and categorical."
]
},
{
"cell_type": "code",
"metadata": {
"id": "R1UVhukR_8VA"
},
"source": [
"scaling_cols = [c for c in X.columns if X[c].dtype.kind in ['i', 'f']]\n",
"cat_cols = [c for c in X.columns if X[c].dtype.kind not in [\"i\", \"f\"]]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jDjgk9PWb2o3"
},
"source": [
"We use ordinal encoding for categorical features/covariates and standard scaling for continuous features/covariates."
]
},
{
"cell_type": "code",
"metadata": {
"id": "axfu8RQeACQx"
},
"source": [
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.preprocessing import OrdinalEncoder\n",
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"preprocessor = ColumnTransformer(\n",
" [('cat-preprocessor', OrdinalEncoder(), cat_cols),\n",
" ('standard-scaler', StandardScaler(), scaling_cols)],\n",
" remainder='passthrough', sparse_threshold=0)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "N_nodxlEb2o-"
},
"source": [
"# Baseline: Cox Proportional Hazards model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KqdChwYJb2o_"
},
"source": [
"Cox Proportional Hazards model assumes that the log-hazard of a subject is a linear function of their $m$ static covariates/features $h_i, i\\in\\{1,\\ldots,m\\}$, and a population-level baseline hazard function $h_0(t)$ that changes over time:\n",
"\\begin{equation}\n",
"h(t|x)=h_0(t)\\exp\\left(\\sum_{i=1}^mh_i(x_i-\\bar{x_i})\\right).\n",
"\\end{equation}\n",
"\n",
"The term *proportional hazards* refers to the assumption of a constant relationship between the dependent variable and the regression coefficients."
]
},
{
"cell_type": "code",
"metadata": {
"id": "77YbwMKvAFHQ",
"outputId": "f5a6349f-b86e-49df-9a2a-9ae06d610fc0",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"from sklearn.pipeline import make_pipeline\n",
"from sksurv.linear_model import CoxPHSurvivalAnalysis\n",
"from sksurv.metrics import concordance_index_censored\n",
"\n",
"cox = make_pipeline(preprocessor, CoxPHSurvivalAnalysis())\n",
"cox.fit(X_trn, y_trn)\n",
"\n",
"ci_cox = concordance_index_censored(y_test[\"cens\"], y_test[\"time\"], cox.predict(X_test))\n",
"print(f'The c-index of Cox is given by {ci_cox[0]:.3f}')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The c-index of Cox is given by 0.635\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZNHQ-bkWALNy",
"outputId": "8683c357-0633-4f9f-8096-9bf13e2d8321",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"from scipy.stats import reciprocal\n",
"from sklearn.model_selection import RandomizedSearchCV\n",
"\n",
"param_distributions = {\n",
" 'coxphsurvivalanalysis__alpha': reciprocal(0.1, 100),\n",
"}\n",
"\n",
"model_random_search = RandomizedSearchCV(\n",
" cox, param_distributions=param_distributions, n_iter=50, n_jobs=-1, cv=3, random_state=42)\n",
"model_random_search.fit(X_trn, y_trn)\n",
"\n",
"print(\n",
" f\"The c-index of Cox using a {model_random_search.__class__.__name__} is \"\n",
" f\"{model_random_search.score(X_test, y_test):.3f}\")\n",
"print(\n",
" f\"The best set of parameters is: {model_random_search.best_params_}\"\n",
")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The c-index of Cox using a RandomizedSearchCV is 0.646\n",
"The best set of parameters is: {'coxphsurvivalanalysis__alpha': 31.428808908401084}\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "MrhAveCQAdxF",
"outputId": "0f791b44-21ac-4546-e7d0-8553c757af6e",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"alpha = model_random_search.best_params_['coxphsurvivalanalysis__alpha']\n",
"cox_best = make_pipeline(preprocessor, CoxPHSurvivalAnalysis(alpha=alpha))\n",
"cox_best.fit(X_trn, y_trn)\n",
"\n",
"ci_cox = concordance_index_censored(y_test[\"cens\"], y_test[\"time\"], cox_best.predict(X_test))\n",
"print(f'The c-index of Cox is given by {ci_cox[0]:.3f}')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The c-index of Cox is given by 0.646\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "WArdzxG0cZ5d",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "05cf000a-579c-42d5-aaeb-f53d3e68a6c8"
},
"source": [
"!pip install -q eli5"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[?25l\r\u001b[K |███ | 10 kB 24.3 MB/s eta 0:00:01\r\u001b[K |██████▏ | 20 kB 13.4 MB/s eta 0:00:01\r\u001b[K |█████████▎ | 30 kB 10.5 MB/s eta 0:00:01\r\u001b[K |████████████▍ | 40 kB 9.1 MB/s eta 0:00:01\r\u001b[K |███████████████▌ | 51 kB 5.3 MB/s eta 0:00:01\r\u001b[K |██████████████████▌ | 61 kB 5.9 MB/s eta 0:00:01\r\u001b[K |█████████████████████▋ | 71 kB 5.7 MB/s eta 0:00:01\r\u001b[K |████████████████████████▊ | 81 kB 6.4 MB/s eta 0:00:01\r\u001b[K |███████████████████████████▉ | 92 kB 5.0 MB/s eta 0:00:01\r\u001b[K |███████████████████████████████ | 102 kB 5.4 MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 106 kB 5.4 MB/s \n",
"\u001b[?25h"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Oy3whfxOAlzN"
},
"source": [
"from eli5.sklearn import PermutationImportance"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "j2wH_sxgBX1-"
},
"source": [
"perm = PermutationImportance(\n",
" cox_best.steps[-1][1], n_iter=100, random_state=42).fit(preprocessor.fit_transform(X_trn),y_trn)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1Rdepqh18UDV"
},
"source": [
"data = perm.results_\n",
"data = pd.DataFrame(data, columns=X_trn.columns)\n",
"meds = data.median()\n",
"meds = meds.sort_values(ascending=False)\n",
"data = data[meds.index]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "PaPRs7hR9QL5",
"outputId": "149f6323-0025-4cee-cb8c-bc443a8420f2",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 444
}
},
"source": [
"fig, ax = plt.subplots(figsize=(10,7))\n",
"data.boxplot(ax=ax)\n",
"ax.set_title('Feature Importances')\n",
"plt.show()"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlkAAAGrCAYAAADzSoLIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df5xcdX3v8deHBCkGCBa8qQISqrTdEGpvidDWrW6MpVrbxluhZfEHXLeXay3x3lt/kN61COi24L3Vtmp/UJdC0S4qrV5KKGDJrjaWWkAFCVvaFKJAscqPRoJBSPjcP85ZGDaz2Ykz38zs7uv5eMwjZ875zpnP+Wbm7HvO+c6ZyEwkSZLUWft1uwBJkqT5yJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCjBkSQtQRGyNiB0Rsb3h9vwOrPOVnaqxhec7LyI+tq+eb08i4syI2NTtOiT1FkOWtHD9QmYe1HD7t24WExGLu/n836u5Wrek8gxZkp4SEUsjYjQi7o+I+yLifRGxqF72wojYGBEPRsQDEfHxiDi0XnY58ALgr+ujYu+KiIGIuHfa+p862lUfiboyIj4WEd8GztzT87dQe0bEWyPiXyLikYh4b13z30fEtyPikxHxrLrtQETcGxH/u96WrRHx+mn98OcR8a2I+FpEvDsi9quXnRkRX4iID0bEg8AngD8GfrLe9v+o270mIr5cP/c9EXFew/qX1/WeERFfr2sYbli+qK7tX+ttuSUijqqX/UhEfDYiHoqIOyPilxse93MRcUf9mPsi4h0t/+dL6jhDlqRGlwI7gRcB/xk4GfjVelkAvwM8H+gDjgLOA8jMNwJf5+mjY+9v8fnWAlcChwIfn+X5W/GzwAnATwDvAi4G3lDXuhIYbGj7A8DhwBHAGcDFEfHD9bIPAUuBHwReDrwJ+K8Njz0JuAtYVq//LcCN9bYfWrd5tH7cocBrgF+LiNdOq7cf+GFgDXBuRPTV83+jrvXngEOANwPfiYglwGeBvwD+E3Aa8IcRsaJ+3Cjw3zPz4Hp7N7bUa5KKMGRJC9dnIuI/6ttnImIZ1R/1/5mZj2bmN4EPUv0hJzO3ZOZnM/O7mfkt4ANUAaQdN2bmZzLzSaowMePzt+j9mfntzNwM3A5cn5l3ZeY24G+ogluj36q353PABuCX6yNnpwG/mZmPZOZW4HeBNzY87t8y80OZuTMzdzQrJDMnMvOrmflkZt4GjLF7f52fmTsy81bgVuDF9fxfBd6dmXdm5dbMfBD4eWBrZv5Z/dxfBv4SOLV+3BPAiog4JDMfzswv7UXfSeowxxJIC9drM/Nvp+5ExInA/sD9ETE1ez/gnnr5MuD3gZ8GDq6XPdxmDfc0TB+9p+dv0b83TO9ocv8HGu4/nJmPNtz/GtVRusPrOr42bdkRM9TdVEScBFxIdUTpWcABwKemNftGw/R3gIPq6aOAf22y2qOBk6ZOSdYWA5fX068D3g1cGBG3Aesz88bZapVUhkeyJE25B/gucHhmHlrfDsnM4+rlvw0kcHxmHkJ1miwaHp/T1vco8OypO/URoudOa9P4mNmev9OeU59+m/IC4N+AB6iOCB09bdl9M9Td7D5Up/SuAo7KzKVU47aiSbtm7gFeOMP8zzX0z6H1KcpfA8jMmzJzLdWpxM8An2zx+SQVYMiSBEBm3g9cD/xuRBwSEfvVA8enTnEdDGwHtkXEEcA7p63i36nGME35Z+D76gHg+1MdYTmgjecv4fyIeFZE/DTVqbhPZeYuqnAyEhEHR8TRVGOk9nS5iH8HjpwaWF87GHgoMx+rjxKevhd1fRR4b0QcG5UfjYjDgKuBH4qIN0bE/vXtJRHRV2/H6yNiaWY+AXwbeHIvnlNShxmyJDV6E9WprTuoTgVeCTyvXnY+8OPANqrxS3817bG/A7y7HuP1jnoc1FupAsN9VEe27mXP9vT8nfaN+jn+jWrQ/Vsy85/qZeuo6r0L2ER1VOqSPaxrI7AZ+EZEPFDPeytwQUQ8ApzL3h1V+kDd/nqqsDQKHJiZj1B9GeC0uu5vABfxdHh9I7C1/rbmW4DXI6lrIrPZUW5Jmr8iYgD4WGYe2e1aJM1fHsmSJEkqwJAlSZJUgKcLJUmSCvBIliRJUgE9dzHSww8/PJcvX97tMgB49NFHWbJkyewNFxj7pTn7ZXf2SXP2S3P2S3P2y+56qU9uueWWBzJz+jUAgR4MWcuXL+fmm2/udhkATExMMDAw0O0yeo790pz9sjv7pDn7pTn7pTn7ZXe91CcR8bWZlnm6UJIkqQBDliRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUwOJuF9AtEdGxdWVmx9YlSZLmhwV7JCszZ70dfc7VLbWTJEmabsGGLEmSpJIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCjBkSZIkFWDIkiRJKsCQJUmSVIAhS5IkqYCWQlZEvCoi7oyILRGxvsnyl0XElyJiZ0ScMm3ZGRHxL/XtjE4VLkmS1MtmDVkRsQj4CPBqYAUwGBErpjX7OnAm8BfTHvv9wHuAk4ATgfdExHPaL1uSJKm3tXIk60RgS2belZmPA1cAaxsbZObWzLwNeHLaY38W+GxmPpSZDwOfBV7VgbolSZJ62uIW2hwB3NNw/16qI1OtaPbYI6Y3ioizgLMAli1bxsTERIurL6+XaukV27dvt1+asF92Z580Z780Z780Z7/sbq70SSshq7jMvBi4GGDVqlU5MDDQ3YKmXLuBnqmlh0xMTNgvTdgvu7NPmrNfmrNfmrNfdjdX+qSV04X3AUc13D+ynteKdh4rSZI0Z7USsm4Cjo2IYyLiWcBpwFUtrv864OSIeE494P3kep4kSdK8NmvIysydwNlU4WgS+GRmbo6ICyLiFwEi4iURcS9wKvAnEbG5fuxDwHupgtpNwAX1PEmSpHmtpTFZmXkNcM20eec2TN9EdSqw2WMvAS5po0ZJkqQ5xyu+S5IkFWDIkiRJKsCQJUmSVIAhS5IkqQBDliRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFlq2djYGCtXrmTNmjWsXLmSsbGxbpckSVLPWtztAjQ3jI2NMTw8zOjoKLt27WLRokUMDQ0BMDg42OXqJEnqPR7JUktGRkYYHR1l9erVLF68mNWrVzM6OsrIyEi3S5MkqScZstSSyclJ+vv7nzGvv7+fycnJLlUkSVJvM2SpJX19fWzatOkZ8zZt2kRfX1+XKpIkqbcZstSS4eFhhoaGGB8fZ+fOnYyPjzM0NMTw8HC3S5MkqSc58F0tmRrcvm7dOiYnJ+nr62NkZMRB75IkzcCQpZYNDg4yODjIxMQEAwMD3S5HkqSe5ulCSZKkAgxZkiRJBRiyJEmSCjBkSZIkFWDIkiRJKsCQJUmSVIAhS5IkqQBDliRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCljc7QLUWyKiY+vKzI6tS5KkuaalI1kR8aqIuDMitkTE+ibLD4iIT9TLvxgRy+v5+0fEZRHx1YiYjIjf7Gz56rTMnPV29DlXt9ROkqSFbNaQFRGLgI8ArwZWAIMRsWJasyHg4cx8EfBB4KJ6/qnAAZl5PHAC8N+nApgkSdJ81sqRrBOBLZl5V2Y+DlwBrJ3WZi1wWT19JbAmqvNOCSyJiMXAgcDjwLc7UrkkSVIPa2VM1hHAPQ337wVOmqlNZu6MiG3AYVSBay1wP/Bs4H9l5kPTnyAizgLOAli2bBkTExN7txUF9VItvcR+2d327dvtl2nsk+bsl+bsl+bsl93NlT4pPfD9RGAX8HzgOcDfRcTfZuZdjY0y82LgYoBVq1blwMBA4bJadO0GeqaWXmK/NDUxMWG/TGOfNGe/NGe/NGe/7G6u9EkrIes+4KiG+0fW85q1ubc+NbgUeBA4Hbg2M58AvhkRXwBWAXdR2IvPv55tO55oez3L129oex1LD9yfW99zctvrkSRJc0crIesm4NiIOIYqTJ1GFZ4aXQWcAdwInAJszMyMiK8DrwAuj4glwE8Av9ep4vdk244n2Hrha9paR6eScieCmiRJmltmHfiemTuBs4HrgEngk5m5OSIuiIhfrJuNAodFxBbgN4Cpyzx8BDgoIjZThbU/y8zbOr0RkiRJvaalMVmZeQ1wzbR55zZMP0Z1uYbpj9vebL4kSdJ858/qSJIkFWDIkiRJKsCQJUmSVIAhS5IkqQBDliRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBSzudgGlHNy3nuMvW9/+ii7rRC0Ar2l/RZIkac6YtyHrkckL2Xphe8FmYmKCgYGBtmtZvn5D2+uQJElzi6cLJUmSCjBkSZIkFWDIkiRJKsCQJUmSVIAhS5IkqQBDliRJUgGGLEmSpAIMWZIkSQXM24uRqrkXn38923Y80fZ62r3A6tID9+fW95zcdh2SJPUqQ9YCs23HEz1xJXyvgi9Jmu88XShJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCljc7QK0bx3ct57jL1vf/ooua7cOgNe0X4ckST2qpZAVEa8Cfh9YBHw0My+ctvwA4M+BE4AHgV/JzK31sh8F/gQ4BHgSeElmPtapDdDeeWTyQrZe2F64mZiYYGBgoK11LF+/oa3HS5LU62Y9XRgRi4CPAK8GVgCDEbFiWrMh4OHMfBHwQeCi+rGLgY8Bb8nM44AB4ImOVS9JktSjWhmTdSKwJTPvyszHgSuAtdParOXpE0hXAmsiIoCTgdsy81aAzHwwM3d1pnRJkqTe1crpwiOAexru3wucNFObzNwZEduAw4AfAjIirgOeC1yRme+f/gQRcRZwFsCyZcuYmJjYy81ort31bN++vWdq6aRe6Zde6pNO6OTrZb6wT5qzX5qzX5qzX3Y3V/qk9MD3xUA/8BLgO8ANEXFLZt7Q2CgzLwYuBli1alW2O94HgGs3tD1uqBNjjzpVS8f0Sr/0Up90SMdeL/OIfdKc/dKc/dKc/bK7udInrZwuvA84quH+kfW8pm3qcVhLqQbA3wt8PjMfyMzvANcAP95u0ZIkSb2ulSNZNwHHRsQxVGHqNOD0aW2uAs4AbgROATZm5tRpwndFxLOBx4GXUw2M3yc68g22a9tfx9ID92+/DkmSNKfMGrLqMVZnA9dRXcLhkszcHBEXADdn5lXAKHB5RGwBHqIKYmTmwxHxAaqglsA1mblPvrvf7mUKoAppnViPJElaeFoak5WZ11Cd6mucd27D9GPAqTM89mNUl3GQJElaMPxZHUmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQV0NIPRGt+Wb5+Q/sruba9dSw9cP/2a5AkqYcZshaYrRe+pu11LF+/oSPrkSRpPvN0oSRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCjBkSZIkFWDIkiRJKsCQJUmSVIAhS5IkqQBDliRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBi7tdgHpLRLTW7qLZ22Rmm9VIkjR3eSRLz5CZs97Gx8dbaidJ0kJmyJIkSSrAkCVJklRASyErIl4VEXdGxJaIWN9k+QER8Yl6+RcjYvm05S+IiO0R8Y7OlC1JktTbZg1ZEbEI+AjwamAFMBgRK6Y1GwIezswXAR8Epg+L/gDwN+2XK0mSNDe0ciTrRGBLZt6VmY8DVwBrp7VZC1xWT18JrIn6a2oR8VrgbmBzZ0qWJEnqfa1cwuEI4J6G+/cCJ83UJjN3RsQ24LCIeAw4B/gZYMZThRFxFnAWwLJly5iYmGi1/uJ6qZZesX37dvulCftld/ZJc/ZLc/ZLc/bL7uZKn5S+TtZ5wAczc/uerr+UmRcDFwOsWrUqBwYGCpfVoms30DO19JCJiQn7pQn7ZXf2SXP2S3P2S3P2y+7mSp+0ErLuA45quH9kPa9Zm3sjYjGwFHiQ6ojXKRHxfuBQ4MmIeCwzP9x25ZIkST2slZB1E3BsRBxDFaZOA06f1uYq4AzgRuAUYGNWV6P86akGEXEesN2AJUmSFoJZQ1Y9xups4DpgEXBJZm6OiAuAmzPzKmAUuDwitgAPUQUxSZKkBaulMVmZeQ1wzbR55zZMPwacOss6zvse6pMkSZqTvOK7JElSAYYsSZKkAgxZkiRJBRiyJEmSCjBkSZIkFWDIkiRJKsCQJUmSVIAhS5IkqQBDliRJUgGGLLVsbGyMlStXsmbNGlauXMnY2Fi3S5IkqWe19LM60tjYGMPDw4yOjrJr1y4WLVrE0NAQAIODg12uTpKk3uORLLVkZGSE0dFRVq9ezeLFi1m9ejWjo6OMjIx0uzRJknqSIUstmZycpL+//xnz+vv7mZyc7FJFkiT1NkOWWtLX18emTZueMW/Tpk309fV1qSJJknqbIUstGR4eZmhoiPHxcXbu3Mn4+DhDQ0MMDw93uzRJknqSA9/VkqnB7evWrWNycpK+vj5GRkYc9C5J0gwMWWrZ4OAgg4ODTExMMDAw0O1yJEnqaZ4ulCRJKsCQJUmSVIAhS5IkqQBDliRJUgEOfJdaEBEdWU9mdmQ9kqTe55EsqQWZOevt6HOunrWNJGnhWLBHslo9MhEXzd7GP56SJGm6BXskq5UjE+Pj4y21kyRJmm7BhixJkqSSDFmSJEkFGLIkSZIKWLAD3yW1z0tbSNLMPJIl6XvWictaGLAkzVeGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUgCFLkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCjBkSZIkFdBSyIqIV0XEnRGxJSLWN1l+QER8ol7+xYhYXs//mYi4JSK+Wv/7is6WL0mS1JtmDVkRsQj4CPBqYAUwGBErpjUbAh7OzBcBHwQuquc/APxCZh4PnAFc3qnCJUmSetniFtqcCGzJzLsAIuIKYC1wR0ObtcB59fSVwIcjIjLzyw1tNgMHRsQBmfndtiuXOuTF51/Pth1PdGRdy9dvaOvxSw/cn1vfc3JHalHvGRsbY2RkhMnJSfr6+hgeHmZwcLDbZUkqpJWQdQRwT8P9e4GTZmqTmTsjYhtwGNWRrCmvA77ULGBFxFnAWQDLli1jYmKi1fqL2r59e8/U0kvmW79s2/EEl75qSdvr2b59OwcddFBb6zjz2kfnVd8C8257vlc33HADo6OjvPOd7+SYY47h7rvv5u1vfzt33HEHa9as6XZ5PWG+7Vs6xX7Z3Zzpk8zc4w04Bfhow/03Ah+e1uZ24MiG+/8KHN5w/7h63gtne74TTjghe8X4+Hi3S+hJ861fjj7n6o6spxP90qlaesV82552HHfccblx48bMfPq1snHjxjzuuOO6WFVvmW/7lk6xX3bXS30C3JwzZJpWBr7fBxzVcP/Iel7TNhGxGFgKPFjfPxL4NPCmzPzXvcyAkjQvTE5O0t/f/4x5/f39TE5OdqkiSaW1crrwJuDYiDiGKkydBpw+rc1VVAPbb6Q68rUxMzMiDgU2AOsz8wudK1tSaZ0aq9buODWYH2PV+vr62LRpE6tXr35q3qZNm+jr6+tiVZJKmjVkZTXG6mzgOmARcElmbo6IC6gOkV0FjAKXR8QW4CGqIAZwNvAi4NyIOLeed3JmfrPTGyKps7bteIKtF76mrXVMTEwwMDDQdi2dCGrdNjw8zNDQEKOjo+zatYvx8XGGhoYYGRnpdmmSCmnlSBaZeQ1wzbR55zZMPwac2uRx7wPe12aNkjTnTX2LcN26dU99u3BkZMRvF0rzWEshS5LUvsHBQQYHBzt2hE9SbzNkSVIHRUTH1lV9cUnSXOVvF0pSB830Ve7G29HnXN1SO0lzmyFLkiSpAE8XasE7uG89x1+22++ef28ua7cWgPa+0SdJ6g2GLC14j0xe2PalCqAzlyuYD5cqkCRVPF0oSZJUgEeyJEnF+a1LLUQeyZIkFee3LrUQGbIkSZIKMGRJkiQV4JgsSU117NIWbV7WoqoFvLSFpLnGkCWpqU5c2qJTv9HnpS0kzUWeLpQkSSrAkCVJklSApwslOng66tr21rP0wP07U4ckqesMWVrwOvGTOlAFtU6tS5I093m6UJIkqQBDliRJUgGeLpQ0o46MVWtznBo4Vk3S3GTIktRUJ8aXOU5N0kLm6UJJkqQCDFmSJEkFGLIkSZIKcEyWJO2FF59/Pdt2PNH2ejrxpYKlB+7Pre85ue31SCrDkCVJe2Hbjif84WxJLfF0oSRJUgGGLEmSpAI8XShJaluvjFWba+PUIqIj68nMjqxHnWXIkiS1rVfGqs21cWqthCMv6jt3ebpQkiSpAEOWJElSAYYsSZKkAgxZkiRJBRiyJEmSCvDbhZK0Fw7uW8/xl61vf0WXdaIWAL91JvUqQ5Yk7YVHJi/siUsVQG9drqBXwqfBU73EkCVJaluvhM9eCp6SIUuSpAI6dRV8WHhXwp8vDFmSJBXQiavgg0f45jK/XShJklSAIUuSJKkAQ5YkSVIBjsmSpL3UkfEt17a/jqUH7t9+HR3UC/3SS33SsctagJe2mKMMWZK0FzoxkHn5+g0dWU8vsV9214nLWoAD3+cyQ5bUgohord1Fe16emR2oRpLmt1b3ubPp9j7XMVlSCzJz1tv4+PisbeabiNjj7WsX/fysbTq1M5U0f8y2Lz36nKtb2i93myFLatPY2BgrV65kzZo1rFy5krGxsW6XtM90Inj2wo5QkkrwdKHUhrGxMYaHhxkdHWXXrl0sWrSIoaEhAAYHB7tcnaRu69hYqHn0hYCFxJAltWFkZITR0VFWr1791ODU0dFR1q1bZ8iSFrhODeKfb18IWEhaClkR8Srg94FFwEcz88Jpyw8A/hw4AXgQ+JXM3Fov+01gCNgFvC0zr+tY9VKXTU5O0t/f/4x5/f39TE5OdqkiSeptnfpNx04cJSz9m46zhqyIWAR8BPgZ4F7gpoi4KjPvaGg2BDycmS+KiNOAi4BfiYgVwGnAccDzgb+NiB/KzF2d3hCpG/r6+ti0aROrV69+at6mTZvo6+vrYlWS1LueXP52Du52EbUnAfhqsfW3ciTrRGBLZt4FEBFXAGuBxpC1Fjivnr4S+HBUXxlaC1yRmd8F7o6ILfX6buxM+VJ3DQ8PMzQ09NSYrPHxcYaGhhgZGel2aZLUkzpx/bBOXDsMyl8/rJWQdQRwT8P9e4GTZmqTmTsjYhtwWD3/H6Y99ojpTxARZwFnASxbtoyJiYkWyy9r+/btPVNLL7Ffnva85z2P17/+9bz5zW/m61//Oi94wQt4wxvewPOe9zz7CF8re2K/NGe/NDff+qUXfh0AYMn+Zfu2Jwa+Z+bFwMUAq1atyk6k007oVFKeb+yXZ7r//vtZsmQJAEuWLGHFihX2T83Xygyu3WC/NGO/NDfP+mXrQPvrmCtfBmglZN0HHNVw/8h6XrM290bEYmAp1QD4Vh4rzVlewkGSNJNWLkZ6E3BsRBwTEc+iGsh+1bQ2VwFn1NOnABuzusLgVcBpEXFARBwDHAv8Y2dKl7qv8RIOixcvZvXq1YyOjjomS5IKmLr489fe/4tz4uLPsx7JqsdYnQ1cR3UJh0syc3NEXADcnJlXAaPA5fXA9oeoghh1u09SDZLfCfy63yzUfOIlHCRp32g8c3DmNd/mQz93SM+fOWhpTFZmXgNcM23euQ3TjwGnzvDYEcCP9ZqXvISD1JpO/cg6dP9Hf1Xenl4vr3jFK6p//291//TTT+f0009v2rbbrxV/u1Bqw9QlHMbHx9m5c+dTl3AYHh7udmlST+nUj6x3+4+m9o1m/+/77bcfjz/++DNeK48//jj77bdfz75WeuLbhdJcNXWIet26dUxOTtLX18fIyEjPHrqWpLlqLp45MGRJbRocHGRwcNDLFQjwtJj2TqdeLwvhtTIXL/5syJKkDmrlj52BXFN8vbRuLp45MGRJkqQ5Ya6dOXDguyRJUgGGLEmSpAIMWZIkSQUYsiRJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5Yk7SNjY2OsXLmSNWvWsHLlSsbGxrpdkqSCvOK7JO0DY2NjDA8PP/W7a4sWLWJoaAigp38WRNL3ziNZkrQPjIyMMDo6yurVq1m8eDGrV69mdHS0p3/cVlJ7DFmStA9MTk7S39//jHn9/f1MTk52qSJJpRmyJGkf6OvrY9OmTc+Yt2nTJvr6+rpUkaTSDFmStA8MDw8zNDTE+Pg4O3fuZHx8nKGhIYaHh7tdmqRCHPguSfvA1OD2daY/3N4AAAp5SURBVOvWMTk5SV9fHyMjIw56l+YxQ5Yk7SODg4MMDg4yMTHBwMBAt8uRVJinCyVJkgowZEmSJBVgyJIkSSrAkCVJklSAIUuSJKkAQ5YkSVIBhixJkqQCDFmSJEkFGLIkSZIKMGRJkiQVYMiSJEkqwJAlSZJUQGRmt2t4hoj4FvC1btdROxx4oNtF9CD7pTn7ZXf2SXP2S3P2S3P2y+56qU+OzsznNlvQcyGrl0TEzZm5qtt19Br7pTn7ZXf2SXP2S3P2S3P2y+7mSp94ulCSJKkAQ5YkSVIBhqw9u7jbBfQo+6U5+2V39klz9ktz9ktz9svu5kSfOCZLkiSpAI9kSZIkFWDIkiRJKmBeh6yIODQi3lpw/dtLrVvdExHLI+L27/GxwxHxlfq2q2H6bRFxaUSc0ul654uIODMint+pduoNreyHI+KaiDh0X9U010XE1og4vNt17Gtz8b0/r0MWcCiwVyErIhYVqmVOaHf7I2Jxp2qZoy7KzB/LzB8DdkxNZ+YfdLuwOeBMoJUdaKvt1Btm3Q9n5s9l5n/so3o0d53JDO/9Xv3bPd9D1oXAC+sjCf8nIv4wIv4pIj5bf3I6BZ76VHBRRHwJODUi/ltE3BQRt0bEX0bEs+t2x0TEjRHx1Yh4X+MTRcQ768fcFhHn7/tNnV19hOafIuLjETEZEVdGxLObbP9gvY23R8RFDY8fioh/joh/jIg/jYgP1/MvjYg/jogvAu+PiBdGxLURcUtE/F1E/EjdbllEfLru11sj4qe60xMtWVRv4+aIuD4iDoyIH4uIf6j/jz8dEc8BiIiJiPi9iLgZ+B+zrPdlEfH3EXFXLx3VanhtXFr/H388Il4ZEV+IiH+JiBMjYklEXFL//385ItbWjz0zIv6q/j//l4h4f8N6d3stRcSi+nlur5f9r7ovVgEfr9+vB0bEufV76vaIuDgqu7XrRn+1IiI+U78HNkfEWfW8md5Dz633NTfVt5d2t/qOatwP/2lEfL6evj0ifhqePjITEW+Jp4/+3h0R4/Xyk+t975ci4lMRcVBXt6hFs+xzz6+356sN+8jvr183t9X7mh+t5x9W74c2R8RHgWh4jjfUr6evRMSf1O+v3d5jXeqClsywDa3sI6b/7Wr6OomIl0S13721fp6D99nGZea8vQHLgdvr6VOAa6iC5Q8ADwOn1Mu2Au9qeNxhDdPvA9bV01cBb6qnfx3YXk+fTPV10qjXfzXwsm5v/wz9kcBL6/uXAO9o3H6qTwlfB54LLAY2Aq+t528Fvh/YH/g74MP1Yy6tt3lRff8G4Nh6+iRgYz39CeB/1tOLgKXd7pM99NNO4Mfq+58E3gDcBry8nncB8Hv19ATwh03Ws33a/UuBT9WvkRXAlm5va5NtPr6u75b69RHAWuAzwG8Db6jbHwr8M7CE6tPlXcBS4PuofhbrqD28lk4APtvw3Ic29OOqhvnf3zB9OfALzdr16m2qfuBA4HbgiD28h/4C6K+nXwBMdrv+Dr+2pvbDbweG6+lFwMH19Fbg8IbHTPXPL1D9fMrngSX1snOAc7u9XXux7TPtc6f+rrwV+Gg9/SHgPfX0K4Cv1NN/MLXNwGvqdR4O9AF/DexfL/tD4E0zvcd68TbDNrynxX3EVp7+29X0dQI8i2r/9JJ6/iHA4n21fQvp1E4/8KnMfBL4xtQnpAafaJheGdWRqkOBg4Dr6vkvBV5XT18OTB3lObm+fbm+fxBwLNV/eK+5JzO/UE9/DHhbPT21/S8BJjLzWwAR8XHgZfWyz2XmQ/X8TwE/1LDeT2XmrvqTw08Bn4p46sPWAfW/r6DaAZCZu4BtndywDrs7M79ST98CvJDqjf65et5lVIFpSuPrZ08+U78G74iIZZ0ptWPuzsyvAkTEZuCGzMyI+CrVH4sjgV+MiHfU7b+PKhBQt91WP/YO4GjgMJq/lt4L/GBEfAjYAFw/Qz2rI+JdwLOpgslmqp3xXPG2iPgv9fRRwBuZ+T30SmBFw3vmkIg4KDPn27jPm4BLImJ/qvfCV2Zo9/tUH87+OiJ+nupDyRfq/nkWcOM+qbYzZtrn/lX97y3AL9XT/dR/YzJzY30E6xCq980v1fM3RMTDdfs1VIHqprpvDgS+SfU+aeU91guabcO1tF7/1L73J2j+Ovlh4P7MvAkgM79dYBtmtJBC1mwebZi+FHhtZt4aEWcCAw3Lml1YLIDfycw/KVZd50yvf+r+o9Mb7qWpx+8H/EdWY5Lmsu82TO+iCtx70mr/Na43ZmzVHY21Pdlw/0mqfcUu4HWZeWfjgyLiJHbvrxn3LZn5cES8GPhZ4C3ALwNvnrbO76P6RLsqM++JiPOoQt2cEBEDVMHpJzPzOxExAfwT1af2ZvYDfiIzH9s3FXZHZn4+Il5GdTTm0oj4QGb+eWObep97NHD21CyqoxqD+7TYzplpnzv1ntnj+2UWAVyWmb+524JZ3mM9pOk2RMQwrdU/te9t+jqJiOM7W+7eme9jsh4Bps69fgF4XUTsVx9BGNjD4w4G7q8/bb2+Yf4XgNPq6cb51wFvbjj/e0RE/KcO1F/CCyLiJ+vp04FN05b/I/DyenzEImAQ+BzVJ9CXR8Rzohrc/jqaqD8l3B0RpwJE5cX14huAX6vnL4qIpZ3csMK2AQ9HPYaE+qhEF+vphuuAdVF/TIyI/zxL+6avpai+FbVfZv4l8G7gx+v2je/XqUD1QP2+ahy/1tiuVy0FHq4D1o9QfcpewszvoeuBdVN3ImKuf0hp9NT/V0QcDfx7Zv4p8FGe/r+nXn4C1em0N9RHfAH+AXhpRLyobrMkIhqPove62fa5jf6O+m9LHdQfqPepn68fS0S8GnhO3f4G4JSpvzf1mK6j9/Ae60VNt4HZ9xHTzfQ6uRN4XkS8pJ5/cOzDL2jN6yNZmflgVAN3bwf+BrgXuAO4B/gSM5+u+i3gi8C36n+n/lP/B/AXEXEO8P8anuf6iOgDbqz//mynGsPzzY5vVPvuBH49Ii6h6os/omHnnpn3R8R6YJzqk8GGzPx/ABHx21R/OB+i+lQ+U/+9HvijiHg31diKK4Bbqfrv4ogYovr09mvMrcP+ZwB/HNUXIe4C/muX69nX3gv8HnBbROwH3A38/EyNZ3ot1aH7z+p1AEx9gr2Uqn93AD8J/CnVWKZvUIV8mrXLzB0d2r5OuhZ4S0RMUr3n/gG4j2pcW7P30NuAj0TEbVT75c9TfYKf86bth5cAj0bEE1T7yTdNa3421anh8XpfenNm/mp9dGssIqaGHrybakzgXLDHfe4051GdTr0N+A7VPgfgfKrt3wz8PdVYRzLzjno/e339fnqCarzwDpq/x3rODNvwG8CnW9hHNK7nW81eJ5n5zxHxK8CHovqizA6qo8z75FT8gvpZnakxDhFxGNWO7qWZ+Y1u17WvRMRy4OrMXPk9Pn6q/xYDnwYuycxPd7BEaV7zPbSwtLvP1dw3r49kNXF1VBe8exbw3oUUsDrkvIh4JdWpnOupvnEmqXW+h6QFZEEdyZIkSdpX5vvAd0mSpK4wZEmSJBVgyJIkSSrAkCVJklSAIUuSJKmA/w9NtjXI12turQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 720x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "IjDH0p1dBuyV",
"outputId": "1f8b09f3-52ce-4b4a-9297-813d798e343f",
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
}
},
"source": [
"import eli5\n",
"eli5.show_weights(perm, feature_names = X.columns.tolist())"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
" <style>\n",
" table.eli5-weights tr:hover {\n",
" filter: brightness(85%);\n",
" }\n",
"</style>\n",
"\n",
"\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
" <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n",
" <thead>\n",
" <tr style=\"border: none;\">\n",
" <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n",
" <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0703\n",
" \n",
" &plusmn; 0.0298\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" tgrade\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 81.86%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0611\n",
" \n",
" &plusmn; 0.0198\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" progrec\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 92.67%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0167\n",
" \n",
" &plusmn; 0.0121\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" horTh\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 93.29%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0148\n",
" \n",
" &plusmn; 0.0111\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" menostat\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 96.19%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0066\n",
" \n",
" &plusmn; 0.0092\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" age\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 96.47%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0059\n",
" \n",
" &plusmn; 0.0074\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" tsize\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 97.64%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0033\n",
" \n",
" &plusmn; 0.0045\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" pnodes\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr style=\"background-color: hsl(120, 100.00%, 99.29%); border: none;\">\n",
" <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
" 0.0006\n",
" \n",
" &plusmn; 0.0018\n",
" \n",
" </td>\n",
" <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
" estrec\n",
" </td>\n",
" </tr>\n",
" \n",
" \n",
" </tbody>\n",
"</table>\n",
" \n",
"\n",
" \n",
"\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
"\n",
"\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"execution_count": 24
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment