Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save alonsosilvaallende/1599bf39c72bda16beac7b03bec05e11 to your computer and use it in GitHub Desktop.
Save alonsosilvaallende/1599bf39c72bda16beac7b03bec05e11 to your computer and use it in GitHub Desktop.
survival-analysis-for-data-analysis-introduction_2024-06.ipynb
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
},
"colab": {
"provenance": [],
"include_colab_link": true
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/alonsosilvaallende/1599bf39c72bda16beac7b03bec05e11/survival-analysis-for-data-analysis-introduction_2024-06.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aEj365e4uTvD"
},
"source": [
"# Survival Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true,
"id": "0Z1wEfUFuTvG"
},
"source": [
"* Historically, survival analysis was developed and used by actuaries\n",
"and medical researchers to measure the lifetime of populations.\n",
"* What's the expected lifetime of patients that were given drug A? drug B?\n",
"* What's the life-expectancy of a baby born today in France?\n",
"\n",
"These researchers wanted to measure the duration between *Birth* and *Death*\n",
"\n",
"\n",
"Source: [Lifelines: Survival Analysis in Python](https://www.youtube.com/watch?v=XQfxndJH4UA)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JxSbfi7QuTvI"
},
"source": [
"# Survival function and hazard function"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fXfTmltSuTvK"
},
"source": [
"**Definition:** Let $T$ be a random variable called failure time.\n",
"\n",
"- $f(t)$ be its probability density function\n",
"- $F(t):=\\mathcal{P}(T\\le t)$ its cumulative distribution function\n",
"\n",
"Then we define\n",
"\n",
"- The *survival function* $S(t):=\\mathcal{P}(T>t)=1-F(t)$.\n",
"- The *hazard function* (probability of failure between $t$ and $t+\\delta t$ knowing that it was working at time $t$):\n",
"$$\n",
"h(t):=\\lim_{\\delta t\\to0}\\frac{\\mathcal{P}(T<t+\\delta t|T>t)}{\\delta t}=\n",
"\\lim_{\\delta t\\to0}\\frac{F(t+\\delta t)-F(t)}{\\delta t}\\times\\frac{1}{1-F(t)}=\\frac{f(t)}{1-F(t)}.\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZBY8J3sxuTvM"
},
"source": [
"**Properties:**\n",
"- $S(t)=\\exp(-\\int_0^t h(s)\\,ds)$.\n",
"- $h(t)=-\\frac{d}{dt}\\ln(S(t))$."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1VYNZWpmuTvO"
},
"source": [
"# Right censoring\n",
"\n",
"By the end of the study, the event of interest (for example, in medicine \"death of a patient\" or \"churn of a customer\") has only occurred for a subset of the observations.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vao5BhTYuTvP"
},
"source": [
"# Modern Survival Analysis\n",
"\n",
"+ **Birth:** Customer joins Netflix \n",
"**Death:** Customer leaves Netflix \n",
"**Censorship:** At the current time, I cannot see all cancelations \n",
" \n",
" \n",
"+ **Birth:** Leader forms government \n",
"**Death:** Government dissolves \n",
"**Censorship:** Death of leader or current time do not allow me to see all dissolvements \n",
"\n",
"\n",
"+ **Birth:** Couple starts dating \n",
"**Death:** Couple breaks-up \n",
"**Censorship:** Some couples never break-up (partner's death comes first) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EVUDoqXOuTvU"
},
"source": [
"First, let's take a dataset from lifelines to see what does it mean in practice."
]
},
{
"cell_type": "code",
"metadata": {
"id": "IcqsnDunuZ3x",
"outputId": "30688795-5be3-45f9-85d6-5ff7a8d53f90",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"%pip install --quiet --upgrade lifelines"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m349.3/349.3 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m94.5/94.5 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Building wheel for autograd-gamma (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:10.331529Z",
"start_time": "2020-01-09T22:37:03.811697Z"
},
"id": "A5qmUxSxuTvW"
},
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"plt.style.use('seaborn-v0_8-bright')"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-Hz_oM2IuTve"
},
"source": [
"from lifelines.datasets import load_dd\n",
"\n",
"df = load_dd()\n",
"df = df[['ctryname', 'un_region_name', 'un_continent_name', 'ehead',\\\n",
" 'democracy', 'regime', 'start_year', 'duration', 'observed']]"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "xCN-nZXDuTvk"
},
"source": [
"# Democracy and dictatorship\n",
"\n",
"This dataset contains a classification of political regimes as democracy and dictatorship.\n",
"* Classification of democracies as\n",
" + parliamentary,\n",
" + semi-presidential (mixed), and\n",
" + presidential.\n",
" \n",
"* Classification of dictatorships as\n",
" + military,\n",
" + civilian, and\n",
" + royal.\n",
" \n",
"Coverage: 202 countries, from 1946 or year of independence to 2008.\n",
"\n",
"**References**\n",
"\n",
"José Antonio Cheibub, Jennifer Gandhi, and James Raymond Vreeland. [\"Democracy and Dictatorship Revisited.\"](https://doi.org/10.1007/s11127-009-9491-2) Public Choice, vol. 143, no. 2-1, pp. 67-101, 2010."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:10.515360Z",
"start_time": "2020-01-09T22:37:10.466697Z"
},
"id": "9b51kQDjuTvm",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"outputId": "93f3573e-1914-404d-a6b2-40ff30fc2583"
},
"source": [
"df.tail(10).style.hide(axis=\"index\")"
],
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<pandas.io.formats.style.Styler at 0x7e761bd11f60>"
],
"text/html": [
"<style type=\"text/css\">\n",
"</style>\n",
"<table id=\"T_a8a13\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th id=\"T_a8a13_level0_col0\" class=\"col_heading level0 col0\" >ctryname</th>\n",
" <th id=\"T_a8a13_level0_col1\" class=\"col_heading level0 col1\" >un_region_name</th>\n",
" <th id=\"T_a8a13_level0_col2\" class=\"col_heading level0 col2\" >un_continent_name</th>\n",
" <th id=\"T_a8a13_level0_col3\" class=\"col_heading level0 col3\" >ehead</th>\n",
" <th id=\"T_a8a13_level0_col4\" class=\"col_heading level0 col4\" >democracy</th>\n",
" <th id=\"T_a8a13_level0_col5\" class=\"col_heading level0 col5\" >regime</th>\n",
" <th id=\"T_a8a13_level0_col6\" class=\"col_heading level0 col6\" >start_year</th>\n",
" <th id=\"T_a8a13_level0_col7\" class=\"col_heading level0 col7\" >duration</th>\n",
" <th id=\"T_a8a13_level0_col8\" class=\"col_heading level0 col8\" >observed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td id=\"T_a8a13_row0_col0\" class=\"data row0 col0\" >Yugoslavia</td>\n",
" <td id=\"T_a8a13_row0_col1\" class=\"data row0 col1\" >Southern Europe</td>\n",
" <td id=\"T_a8a13_row0_col2\" class=\"data row0 col2\" >Europe</td>\n",
" <td id=\"T_a8a13_row0_col3\" class=\"data row0 col3\" >Stipe Suvar</td>\n",
" <td id=\"T_a8a13_row0_col4\" class=\"data row0 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row0_col5\" class=\"data row0 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row0_col6\" class=\"data row0 col6\" >1988</td>\n",
" <td id=\"T_a8a13_row0_col7\" class=\"data row0 col7\" >1</td>\n",
" <td id=\"T_a8a13_row0_col8\" class=\"data row0 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row1_col0\" class=\"data row1 col0\" >Yugoslavia</td>\n",
" <td id=\"T_a8a13_row1_col1\" class=\"data row1 col1\" >Southern Europe</td>\n",
" <td id=\"T_a8a13_row1_col2\" class=\"data row1 col2\" >Europe</td>\n",
" <td id=\"T_a8a13_row1_col3\" class=\"data row1 col3\" >Milan Pancevski</td>\n",
" <td id=\"T_a8a13_row1_col4\" class=\"data row1 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row1_col5\" class=\"data row1 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row1_col6\" class=\"data row1 col6\" >1989</td>\n",
" <td id=\"T_a8a13_row1_col7\" class=\"data row1 col7\" >1</td>\n",
" <td id=\"T_a8a13_row1_col8\" class=\"data row1 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row2_col0\" class=\"data row2 col0\" >Yugoslavia</td>\n",
" <td id=\"T_a8a13_row2_col1\" class=\"data row2 col1\" >Southern Europe</td>\n",
" <td id=\"T_a8a13_row2_col2\" class=\"data row2 col2\" >Europe</td>\n",
" <td id=\"T_a8a13_row2_col3\" class=\"data row2 col3\" >Borisav Jovic</td>\n",
" <td id=\"T_a8a13_row2_col4\" class=\"data row2 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row2_col5\" class=\"data row2 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row2_col6\" class=\"data row2 col6\" >1990</td>\n",
" <td id=\"T_a8a13_row2_col7\" class=\"data row2 col7\" >1</td>\n",
" <td id=\"T_a8a13_row2_col8\" class=\"data row2 col8\" >0</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row3_col0\" class=\"data row3 col0\" >Zambia</td>\n",
" <td id=\"T_a8a13_row3_col1\" class=\"data row3 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row3_col2\" class=\"data row3 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row3_col3\" class=\"data row3 col3\" >Kenneth Kaunda</td>\n",
" <td id=\"T_a8a13_row3_col4\" class=\"data row3 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row3_col5\" class=\"data row3 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row3_col6\" class=\"data row3 col6\" >1964</td>\n",
" <td id=\"T_a8a13_row3_col7\" class=\"data row3 col7\" >27</td>\n",
" <td id=\"T_a8a13_row3_col8\" class=\"data row3 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row4_col0\" class=\"data row4 col0\" >Zambia</td>\n",
" <td id=\"T_a8a13_row4_col1\" class=\"data row4 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row4_col2\" class=\"data row4 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row4_col3\" class=\"data row4 col3\" >Frederick Chiluba</td>\n",
" <td id=\"T_a8a13_row4_col4\" class=\"data row4 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row4_col5\" class=\"data row4 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row4_col6\" class=\"data row4 col6\" >1991</td>\n",
" <td id=\"T_a8a13_row4_col7\" class=\"data row4 col7\" >11</td>\n",
" <td id=\"T_a8a13_row4_col8\" class=\"data row4 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row5_col0\" class=\"data row5 col0\" >Zambia</td>\n",
" <td id=\"T_a8a13_row5_col1\" class=\"data row5 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row5_col2\" class=\"data row5 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row5_col3\" class=\"data row5 col3\" >Levy Patrick Mwanawasa</td>\n",
" <td id=\"T_a8a13_row5_col4\" class=\"data row5 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row5_col5\" class=\"data row5 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row5_col6\" class=\"data row5 col6\" >2002</td>\n",
" <td id=\"T_a8a13_row5_col7\" class=\"data row5 col7\" >6</td>\n",
" <td id=\"T_a8a13_row5_col8\" class=\"data row5 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row6_col0\" class=\"data row6 col0\" >Zambia</td>\n",
" <td id=\"T_a8a13_row6_col1\" class=\"data row6 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row6_col2\" class=\"data row6 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row6_col3\" class=\"data row6 col3\" >Rupiah Bwezani Banda</td>\n",
" <td id=\"T_a8a13_row6_col4\" class=\"data row6 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row6_col5\" class=\"data row6 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row6_col6\" class=\"data row6 col6\" >2008</td>\n",
" <td id=\"T_a8a13_row6_col7\" class=\"data row6 col7\" >1</td>\n",
" <td id=\"T_a8a13_row6_col8\" class=\"data row6 col8\" >0</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row7_col0\" class=\"data row7 col0\" >Zimbabwe</td>\n",
" <td id=\"T_a8a13_row7_col1\" class=\"data row7 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row7_col2\" class=\"data row7 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row7_col3\" class=\"data row7 col3\" >Ian Smith</td>\n",
" <td id=\"T_a8a13_row7_col4\" class=\"data row7 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row7_col5\" class=\"data row7 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row7_col6\" class=\"data row7 col6\" >1965</td>\n",
" <td id=\"T_a8a13_row7_col7\" class=\"data row7 col7\" >14</td>\n",
" <td id=\"T_a8a13_row7_col8\" class=\"data row7 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row8_col0\" class=\"data row8 col0\" >Zimbabwe</td>\n",
" <td id=\"T_a8a13_row8_col1\" class=\"data row8 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row8_col2\" class=\"data row8 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row8_col3\" class=\"data row8 col3\" >Abel Muzorewa</td>\n",
" <td id=\"T_a8a13_row8_col4\" class=\"data row8 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row8_col5\" class=\"data row8 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row8_col6\" class=\"data row8 col6\" >1979</td>\n",
" <td id=\"T_a8a13_row8_col7\" class=\"data row8 col7\" >1</td>\n",
" <td id=\"T_a8a13_row8_col8\" class=\"data row8 col8\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_a8a13_row9_col0\" class=\"data row9 col0\" >Zimbabwe</td>\n",
" <td id=\"T_a8a13_row9_col1\" class=\"data row9 col1\" >Eastern Africa</td>\n",
" <td id=\"T_a8a13_row9_col2\" class=\"data row9 col2\" >Africa</td>\n",
" <td id=\"T_a8a13_row9_col3\" class=\"data row9 col3\" >Robert Mugabe</td>\n",
" <td id=\"T_a8a13_row9_col4\" class=\"data row9 col4\" >Non-democracy</td>\n",
" <td id=\"T_a8a13_row9_col5\" class=\"data row9 col5\" >Civilian Dict</td>\n",
" <td id=\"T_a8a13_row9_col6\" class=\"data row9 col6\" >1980</td>\n",
" <td id=\"T_a8a13_row9_col7\" class=\"data row9 col7\" >29</td>\n",
" <td id=\"T_a8a13_row9_col8\" class=\"data row9 col8\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
]
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-o57OZxvuTvu"
},
"source": [
"Let's look at right-censored samples."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:12.185066Z",
"start_time": "2020-01-09T22:37:10.519270Z"
},
"scrolled": false,
"id": "vzaXN9QWuTvv",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 394
},
"outputId": "88a4f722-ad3a-4e84-e17f-02c9f03a4e5f"
},
"source": [
"format_dict = {'ehead':'{}','duration':'{}', 'observed':'{}'}\n",
"(df.query('ctryname == \"United States of America\"')[['ehead', 'duration', 'observed']].style.format(format_dict)\n",
" .hide(axis=\"index\")\n",
" .highlight_min('observed', color='lightgreen'))"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<pandas.io.formats.style.Styler at 0x7e75e4c0f070>"
],
"text/html": [
"<style type=\"text/css\">\n",
"#T_08b5f_row2_col2, #T_08b5f_row10_col2 {\n",
" background-color: lightgreen;\n",
"}\n",
"</style>\n",
"<table id=\"T_08b5f\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th id=\"T_08b5f_level0_col0\" class=\"col_heading level0 col0\" >ehead</th>\n",
" <th id=\"T_08b5f_level0_col1\" class=\"col_heading level0 col1\" >duration</th>\n",
" <th id=\"T_08b5f_level0_col2\" class=\"col_heading level0 col2\" >observed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td id=\"T_08b5f_row0_col0\" class=\"data row0 col0\" >Harry Truman</td>\n",
" <td id=\"T_08b5f_row0_col1\" class=\"data row0 col1\" >7</td>\n",
" <td id=\"T_08b5f_row0_col2\" class=\"data row0 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row1_col0\" class=\"data row1 col0\" >Dwight D. Eisenhower</td>\n",
" <td id=\"T_08b5f_row1_col1\" class=\"data row1 col1\" >8</td>\n",
" <td id=\"T_08b5f_row1_col2\" class=\"data row1 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row2_col0\" class=\"data row2 col0\" >John Kennedy</td>\n",
" <td id=\"T_08b5f_row2_col1\" class=\"data row2 col1\" >2</td>\n",
" <td id=\"T_08b5f_row2_col2\" class=\"data row2 col2\" >0</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row3_col0\" class=\"data row3 col0\" >Lyndon Johnson</td>\n",
" <td id=\"T_08b5f_row3_col1\" class=\"data row3 col1\" >6</td>\n",
" <td id=\"T_08b5f_row3_col2\" class=\"data row3 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row4_col0\" class=\"data row4 col0\" >Richard Nixon</td>\n",
" <td id=\"T_08b5f_row4_col1\" class=\"data row4 col1\" >5</td>\n",
" <td id=\"T_08b5f_row4_col2\" class=\"data row4 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row5_col0\" class=\"data row5 col0\" >Gerald Ford</td>\n",
" <td id=\"T_08b5f_row5_col1\" class=\"data row5 col1\" >3</td>\n",
" <td id=\"T_08b5f_row5_col2\" class=\"data row5 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row6_col0\" class=\"data row6 col0\" >Jimmy Carter</td>\n",
" <td id=\"T_08b5f_row6_col1\" class=\"data row6 col1\" >4</td>\n",
" <td id=\"T_08b5f_row6_col2\" class=\"data row6 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row7_col0\" class=\"data row7 col0\" >Ronald Reagan</td>\n",
" <td id=\"T_08b5f_row7_col1\" class=\"data row7 col1\" >8</td>\n",
" <td id=\"T_08b5f_row7_col2\" class=\"data row7 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row8_col0\" class=\"data row8 col0\" >George Bush</td>\n",
" <td id=\"T_08b5f_row8_col1\" class=\"data row8 col1\" >4</td>\n",
" <td id=\"T_08b5f_row8_col2\" class=\"data row8 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row9_col0\" class=\"data row9 col0\" >Bill Clinton</td>\n",
" <td id=\"T_08b5f_row9_col1\" class=\"data row9 col1\" >8</td>\n",
" <td id=\"T_08b5f_row9_col2\" class=\"data row9 col2\" >1</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_08b5f_row10_col0\" class=\"data row10 col0\" >George W. Bush</td>\n",
" <td id=\"T_08b5f_row10_col1\" class=\"data row10 col1\" >8</td>\n",
" <td id=\"T_08b5f_row10_col2\" class=\"data row10 col2\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:12.216114Z",
"start_time": "2020-01-09T22:37:12.189453Z"
},
"id": "AVHDxLziuTv2",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "fd106b64-f279-417c-d25d-5d370ca19635"
},
"source": [
"print(f'samples: {len(df)}\\n')\n",
"print(f'right censored samples: {len(df.query(\"observed == 0\"))}')\n",
"print(f'right censored samples (%): {100*len(df.query(\"observed == 0\"))/len(df):.1f}%')"
],
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"samples: 1808\n",
"\n",
"right censored samples: 340\n",
"right censored samples (%): 18.8%\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3v0OGKU_uTv9"
},
"source": [
"# How can we estimate the probability of a government survival?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "J0UzmdpeuTv9"
},
"source": [
"**Example:** I want to estimate the survival function of a new machine and I have 100 of these new machines. After the first year:\n",
"\n",
"Samples | I\n",
"--- | ---\n",
"Initial numbers | 100\n",
"Deaths in first year of age | 70\n",
"One-year survivors | `30`\n",
"\n",
"Therefore, a reasonable estimate of the survival probability of 1 year is 0.3.\n",
"\n",
"I have increased my production. So now I have 1000 new machines.\n",
"\n",
"Samples | I | II\n",
"--- | --- | ---\n",
"Initial numbers | 100 | 1000\n",
"Deaths in first year of age | 70 | 750\n",
"One-year survivors | `30` | `250`\n",
"Deaths in second year of age | 15 |\n",
"Two-year survivors | `15` |\n",
"\n",
"What would be a good estimate of the survival probability of 1 year? and 2 years?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RVevvzzCuTv-"
},
"source": [
"The estimate of the probability of survival of 1 year would be\n",
"$\\hat{P}(1)=(30+250)/(100+1000) \\sim 0.255$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BlBS5dveuTwA"
},
"source": [
"$\\hat{P}(2|1) = 15/30 = 0.5$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zVja8-P5uTwA"
},
"source": [
"$\\hat{P}(2)=0.255\\times0.5=0.127$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pG5yhITUuTwB"
},
"source": [
"# Kaplan-Meier estimator"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E6hn2l6PuTwC"
},
"source": [
"**Definition:** Kaplan-Meier estimator of the survival function is given by\n",
"\n",
"$$\n",
"\\hat{S}(t):=\\prod_{i:t_i\\le t}\\left(1-\\frac{d_i}{n_i}\\right)\n",
"$$\n",
"where $t_i$ is a time where at least one event happened, $d_i$ the number of events that happened at time $t_i$, and $n_i$ the individuals known to have survived up to time $t_i$."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cojEUEjsuTwD"
},
"source": [
"We use the [Kaplan-Meier estimator](https://en.wikipedia.org/wiki/Kaplan?Meier_estimator) to estimate the probability of a government survival."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:16.155051Z",
"start_time": "2020-01-09T22:37:12.219801Z"
},
"scrolled": false,
"id": "LCGcjoexuTwE",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"outputId": "ffebb8c4-18b4-45e8-f188-6f504c467469"
},
"source": [
"from lifelines import KaplanMeierFitter\n",
"kmf = KaplanMeierFitter()\n",
"kmf.fit(df['duration'],df['observed'], label='Estimate for average government')\n",
"\n",
"fig, ax = plt.subplots(figsize=(10,7))\n",
"kmf.plot(ax=ax)\n",
"plt.title('Estimated probability of government survival vs number of years')\n",
"plt.xlabel('Time (in years)')\n",
"plt.ylabel('Estimated probability of government survival')\n",
"plt.show()"
],
"execution_count": 7,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "pdrG0N6nuTwL",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "73c4aa3d-7b68-4579-c91b-6ba324b0c013"
},
"source": [
"print(f'The median number of years of government survival is {kmf.median_survival_time_}')"
],
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The median number of years of government survival is 4.0\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": false,
"id": "NQHPG2x6uTwQ",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"outputId": "6bbec1e6-3a8f-4f6d-d7c3-d50638ec00dd"
},
"source": [
"fig, ax = plt.subplots(figsize=(10,7))\n",
"for r in df['democracy'].unique():\n",
" ix = df['democracy'] == r\n",
" kmf.fit(df['duration'].loc[ix], df['observed'].loc[ix], label=r)\n",
" kmf.plot(ax=ax)\n",
"plt.title('Estimated probability of government survival vs number of years')\n",
"plt.xlabel('Time (in years)')\n",
"plt.ylabel('Estimated probability of government survival')\n",
"plt.show()"
],
"execution_count": 9,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "PBKPtOFKuTwV",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4eb001fe-5d39-4b06-912c-7e09d56c8834"
},
"source": [
"for r in df['democracy'].unique():\n",
" ix = df['democracy'] == r\n",
" kmf.fit(df['duration'].loc[ix], df['observed'].loc[ix], label=r)\n",
" print(f'The median number of years for a {r} is {kmf.median_survival_time_}')"
],
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The median number of years for a Non-democracy is 6.0\n",
"The median number of years for a Democracy is 3.0\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZhK-X5-fuTwc"
},
"source": [
"How can we tell if these survival functions are different?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uL59D8GAuTwc"
},
"source": [
"# Log-rank test (not recommended but still very common statistical test)"
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": false,
"id": "LDq8D5ovuTwe",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "27d154ce-f93f-4764-ab4f-0d93b3b9d481"
},
"source": [
"df['democracy'].unique()"
],
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['Non-democracy', 'Democracy'], dtype=object)"
]
},
"metadata": {},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "kaRMHNtSuTwi"
},
"source": [
"from lifelines.statistics import logrank_test"
],
"execution_count": 12,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1qrrTDDWuTwn"
},
"source": [
"ix = df['democracy'] == 'Democracy'\n",
"T_democracy, E_democracy = df.loc[ix, 'duration'], df.loc[ix, 'observed']\n",
"T_non_democracy, E_non_democracy = df.loc[~ix, 'duration'], df.loc[~ix, 'observed']"
],
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "Qw20wiiRuTws",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"outputId": "a511caed-5e31-4caf-fd3e-a866b8e10885"
},
"source": [
"results = logrank_test(T_democracy, T_non_democracy, event_observed_A=E_democracy, event_observed_B=E_non_democracy)\n",
"results.print_summary()"
],
"execution_count": 14,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<lifelines.StatisticalResult: logrank_test>\n",
" t_0 = -1\n",
" null_distribution = chi squared\n",
"degrees_of_freedom = 1\n",
" test_name = logrank_test\n",
"\n",
"---\n",
" test_statistic p -log2(p)\n",
" 260.47 <0.005 192.23"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>t_0</th>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>null_distribution</th>\n",
" <td>chi squared</td>\n",
" </tr>\n",
" <tr>\n",
" <th>degrees_of_freedom</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>test_name</th>\n",
" <td>logrank_test</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div><table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>test_statistic</th>\n",
" <th>p</th>\n",
" <th>-log2(p)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>260.47</td>\n",
" <td>&lt;0.005</td>\n",
" <td>192.23</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/latex": "\\begin{tabular}{lrrr}\n & test_statistic & p & -log2(p) \\\\\n0 & 260.47 & 0.00 & 192.23 \\\\\n\\end{tabular}\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "1Oms68oOuTww",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "101c1315-c1bb-473c-a69c-64f83e41f462"
},
"source": [
"print(results.p_value)\n",
"print(results.test_statistic)"
],
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1.3557143218482446e-58\n",
"260.46953907795944\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qTX47C6-uTw1"
},
"source": [
"# Univariate Cox regression"
]
},
{
"cell_type": "code",
"metadata": {
"id": "meyGKaYcuTw4",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 491
},
"outputId": "49c76672-40b9-4383-b7af-2fd52dc18546"
},
"source": [
"from lifelines import CoxPHFitter\n",
"\n",
"cph = CoxPHFitter()\n",
"df_Uni_Cox = df.copy()\n",
"df_Uni_Cox['indicator'] = df_Uni_Cox['democracy'] == 'Democracy'\n",
"cph.fit(df_Uni_Cox[['indicator', 'duration', 'observed']], 'duration', 'observed')\n",
"cph.print_summary()"
],
"execution_count": 16,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<lifelines.CoxPHFitter: fitted with 1808 total observations, 340 right-censored observations>\n",
" duration col = 'duration'\n",
" event col = 'observed'\n",
" baseline estimation = breslow\n",
" number of observations = 1808\n",
"number of events observed = 1468\n",
" partial log-likelihood = -9614.27\n",
" time fit was run = 2024-12-04 06:22:50 UTC\n",
"\n",
"---\n",
" coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%\n",
"covariate \n",
"indicator 0.96 2.62 0.06 0.84 1.09 2.32 2.96\n",
"\n",
" cmp to z p -log2(p)\n",
"covariate \n",
"indicator 0.00 15.40 <0.005 175.43\n",
"---\n",
"Concordance = 0.59\n",
"Partial AIC = 19230.53\n",
"log-likelihood ratio test = 264.03 on 1 df\n",
"-log2(p) of ll-ratio test = 194.81"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>model</th>\n",
" <td>lifelines.CoxPHFitter</td>\n",
" </tr>\n",
" <tr>\n",
" <th>duration col</th>\n",
" <td>'duration'</td>\n",
" </tr>\n",
" <tr>\n",
" <th>event col</th>\n",
" <td>'observed'</td>\n",
" </tr>\n",
" <tr>\n",
" <th>baseline estimation</th>\n",
" <td>breslow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>number of observations</th>\n",
" <td>1808</td>\n",
" </tr>\n",
" <tr>\n",
" <th>number of events observed</th>\n",
" <td>1468</td>\n",
" </tr>\n",
" <tr>\n",
" <th>partial log-likelihood</th>\n",
" <td>-9614.27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>time fit was run</th>\n",
" <td>2024-12-04 06:22:50 UTC</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div><table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th style=\"min-width: 12px;\"></th>\n",
" <th style=\"min-width: 12px;\">coef</th>\n",
" <th style=\"min-width: 12px;\">exp(coef)</th>\n",
" <th style=\"min-width: 12px;\">se(coef)</th>\n",
" <th style=\"min-width: 12px;\">coef lower 95%</th>\n",
" <th style=\"min-width: 12px;\">coef upper 95%</th>\n",
" <th style=\"min-width: 12px;\">exp(coef) lower 95%</th>\n",
" <th style=\"min-width: 12px;\">exp(coef) upper 95%</th>\n",
" <th style=\"min-width: 12px;\">cmp to</th>\n",
" <th style=\"min-width: 12px;\">z</th>\n",
" <th style=\"min-width: 12px;\">p</th>\n",
" <th style=\"min-width: 12px;\">-log2(p)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>indicator</th>\n",
" <td>0.96</td>\n",
" <td>2.62</td>\n",
" <td>0.06</td>\n",
" <td>0.84</td>\n",
" <td>1.09</td>\n",
" <td>2.32</td>\n",
" <td>2.96</td>\n",
" <td>0.00</td>\n",
" <td>15.40</td>\n",
" <td>&lt;0.005</td>\n",
" <td>175.43</td>\n",
" </tr>\n",
" </tbody>\n",
"</table><br><div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>Concordance</th>\n",
" <td>0.59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Partial AIC</th>\n",
" <td>19230.53</td>\n",
" </tr>\n",
" <tr>\n",
" <th>log-likelihood ratio test</th>\n",
" <td>264.03 on 1 df</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-log2(p) of ll-ratio test</th>\n",
" <td>194.81</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/latex": "\\begin{tabular}{lrrrrrrrrrrr}\n & coef & exp(coef) & se(coef) & coef lower 95% & coef upper 95% & exp(coef) lower 95% & exp(coef) upper 95% & cmp to & z & p & -log2(p) \\\\\ncovariate & & & & & & & & & & & \\\\\nindicator & 0.96 & 2.62 & 0.06 & 0.84 & 1.09 & 2.32 & 2.96 & 0.00 & 15.40 & 0.00 & 175.43 \\\\\n\\end{tabular}\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:17.131185Z",
"start_time": "2020-01-09T22:37:16.573714Z"
},
"id": "USPOd2fOuTw8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"outputId": "fe195a8c-7b94-48e9-e4a0-abc2acef31d8"
},
"source": [
"fig, ax = plt.subplots(figsize=(10,7))\n",
"\n",
"for r in df['regime'].unique():\n",
" ix = df['regime'] == r\n",
" kmf.fit(df['duration'].loc[ix], df['observed'].loc[ix], label=r)\n",
" kmf.survival_function_.plot(ax=ax)\n",
"plt.title('Estimated probability of government survival vs number of years')\n",
"plt.xlabel('Time (in years)')\n",
"plt.ylabel('Estimated probability of government survival')\n",
"plt.show()"
],
"execution_count": 17,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:17.690930Z",
"start_time": "2020-01-09T22:37:17.134877Z"
},
"scrolled": false,
"id": "ydmaYBqnuTxA",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"outputId": "6e613bfc-70b5-445f-8e13-795442d4aef1"
},
"source": [
"fig, ax = plt.subplots(figsize=(10,7))\n",
"for r in df['un_continent_name'].unique():\n",
" ix = df['un_continent_name'] == r\n",
" kmf.fit(df['duration'].loc[ix], df['observed'].loc[ix], label=r)\n",
" kmf.plot(ax=ax)\n",
"plt.title('Estimated probability of government survival vs number of years')\n",
"plt.xlabel('Time (in years)')\n",
"plt.ylabel('Estimated probability of government survival')\n",
"plt.show()"
],
"execution_count": 18,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:17.723699Z",
"start_time": "2020-01-09T22:37:17.694880Z"
},
"scrolled": false,
"id": "NXIl1sBfuTxF",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
},
"outputId": "1376fd90-d54d-4407-faa8-748f4a22daa2"
},
"source": [
"df.query('ctryname == \"United States of America\"').tail(3)"
],
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" ctryname un_region_name un_continent_name \\\n",
"1721 United States of America Northern America Americas \n",
"1722 United States of America Northern America Americas \n",
"1723 United States of America Northern America Americas \n",
"\n",
" ehead democracy regime start_year duration \\\n",
"1721 George Bush Democracy Presidential Dem 1989 4 \n",
"1722 Bill Clinton Democracy Presidential Dem 1993 8 \n",
"1723 George W. Bush Democracy Presidential Dem 2001 8 \n",
"\n",
" observed \n",
"1721 1 \n",
"1722 1 \n",
"1723 0 "
],
"text/html": [
"\n",
" <div id=\"df-3641d4f3-70ef-4450-ab24-c212f0e98338\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ctryname</th>\n",
" <th>un_region_name</th>\n",
" <th>un_continent_name</th>\n",
" <th>ehead</th>\n",
" <th>democracy</th>\n",
" <th>regime</th>\n",
" <th>start_year</th>\n",
" <th>duration</th>\n",
" <th>observed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1721</th>\n",
" <td>United States of America</td>\n",
" <td>Northern America</td>\n",
" <td>Americas</td>\n",
" <td>George Bush</td>\n",
" <td>Democracy</td>\n",
" <td>Presidential Dem</td>\n",
" <td>1989</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1722</th>\n",
" <td>United States of America</td>\n",
" <td>Northern America</td>\n",
" <td>Americas</td>\n",
" <td>Bill Clinton</td>\n",
" <td>Democracy</td>\n",
" <td>Presidential Dem</td>\n",
" <td>1993</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1723</th>\n",
" <td>United States of America</td>\n",
" <td>Northern America</td>\n",
" <td>Americas</td>\n",
" <td>George W. Bush</td>\n",
" <td>Democracy</td>\n",
" <td>Presidential Dem</td>\n",
" <td>2001</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3641d4f3-70ef-4450-ab24-c212f0e98338')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-3641d4f3-70ef-4450-ab24-c212f0e98338 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-3641d4f3-70ef-4450-ab24-c212f0e98338');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-a051ef99-8227-4ff8-b806-612efc3b0941\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-a051ef99-8227-4ff8-b806-612efc3b0941')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-a051ef99-8227-4ff8-b806-612efc3b0941 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"summary": "{\n \"name\": \"df\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"ctryname\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"United States of America\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"un_region_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Northern America\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"un_continent_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Americas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ehead\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"George Bush\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"democracy\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Democracy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"regime\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Presidential Dem\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"start_year\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 1989,\n \"max\": 2001,\n \"num_unique_values\": 3,\n \"samples\": [\n 1989\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"duration\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2,\n \"min\": 4,\n \"max\": 8,\n \"num_unique_values\": 2,\n \"samples\": [\n 8\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"observed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 19
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:17.748036Z",
"start_time": "2020-01-09T22:37:17.725349Z"
},
"scrolled": false,
"id": "Qzy7VJ_luTxI",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
},
"outputId": "b5500e0c-f7b3-4959-d8e5-2bf1d8f32a3d"
},
"source": [
"df.query('ctryname == \"United Kingdom\"').tail(3)"
],
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" ctryname un_region_name un_continent_name ehead \\\n",
"1710 United Kingdom Northern Europe Europe John Major \n",
"1711 United Kingdom Northern Europe Europe Tony Blair \n",
"1712 United Kingdom Northern Europe Europe Gordon Brown \n",
"\n",
" democracy regime start_year duration observed \n",
"1710 Democracy Parliamentary Dem 1990 7 1 \n",
"1711 Democracy Parliamentary Dem 1997 10 1 \n",
"1712 Democracy Parliamentary Dem 2007 2 0 "
],
"text/html": [
"\n",
" <div id=\"df-ba9bd7c6-4929-4960-b7e5-e1ae6fb0b788\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ctryname</th>\n",
" <th>un_region_name</th>\n",
" <th>un_continent_name</th>\n",
" <th>ehead</th>\n",
" <th>democracy</th>\n",
" <th>regime</th>\n",
" <th>start_year</th>\n",
" <th>duration</th>\n",
" <th>observed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1710</th>\n",
" <td>United Kingdom</td>\n",
" <td>Northern Europe</td>\n",
" <td>Europe</td>\n",
" <td>John Major</td>\n",
" <td>Democracy</td>\n",
" <td>Parliamentary Dem</td>\n",
" <td>1990</td>\n",
" <td>7</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1711</th>\n",
" <td>United Kingdom</td>\n",
" <td>Northern Europe</td>\n",
" <td>Europe</td>\n",
" <td>Tony Blair</td>\n",
" <td>Democracy</td>\n",
" <td>Parliamentary Dem</td>\n",
" <td>1997</td>\n",
" <td>10</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1712</th>\n",
" <td>United Kingdom</td>\n",
" <td>Northern Europe</td>\n",
" <td>Europe</td>\n",
" <td>Gordon Brown</td>\n",
" <td>Democracy</td>\n",
" <td>Parliamentary Dem</td>\n",
" <td>2007</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ba9bd7c6-4929-4960-b7e5-e1ae6fb0b788')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-ba9bd7c6-4929-4960-b7e5-e1ae6fb0b788 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-ba9bd7c6-4929-4960-b7e5-e1ae6fb0b788');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-df115887-6ad2-4939-abd4-b398f4da4f40\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-df115887-6ad2-4939-abd4-b398f4da4f40')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-df115887-6ad2-4939-abd4-b398f4da4f40 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"summary": "{\n \"name\": \"df\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"ctryname\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"United Kingdom\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"un_region_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Northern Europe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"un_continent_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Europe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ehead\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"John Major\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"democracy\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Democracy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"regime\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Parliamentary Dem\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"start_year\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 8,\n \"min\": 1990,\n \"max\": 2007,\n \"num_unique_values\": 3,\n \"samples\": [\n 1990\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"duration\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 2,\n \"max\": 10,\n \"num_unique_values\": 3,\n \"samples\": [\n 7\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"observed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.129226Z",
"start_time": "2020-01-09T22:37:17.749639Z"
},
"id": "010659VeuTxM",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"outputId": "7741b122-cfd5-4825-cbc0-d2b58f2b0cf0"
},
"source": [
"ix_US = df['ctryname'] == 'United States of America'\n",
"ix_UK = df['ctryname'] == 'United Kingdom'\n",
"\n",
"kmf_US = KaplanMeierFitter()\n",
"kmf_US.fit(df['duration'].loc[ix_US], df['observed'].loc[ix_US], label='USA')\n",
"\n",
"kmf_UK = KaplanMeierFitter()\n",
"kmf_UK.fit(df['duration'].loc[ix_UK], df['observed'].loc[ix_UK], label='UK')\n",
"\n",
"plt.figure(figsize=(10,7))\n",
"ax = plt.subplot(111)\n",
"kmf_US.plot(ax=ax)\n",
"kmf_UK.plot(ax=ax)\n",
"plt.title('Estimated probability of government survival vs number of years')\n",
"plt.xlabel('Time (in years)')\n",
"plt.ylabel('Estimated probability of government survival')\n",
"plt.show()"
],
"execution_count": 21,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VaQCN1ChuTxs"
},
"source": [
"# Multivariate Cox regression"
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": false,
"id": "nCPi7YY1uTxQ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1f56cb3c-c03b-41ff-c3d0-7d2e446e492d"
},
"source": [
"df.columns"
],
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['ctryname', 'un_region_name', 'un_continent_name', 'ehead', 'democracy',\n",
" 'regime', 'start_year', 'duration', 'observed'],\n",
" dtype='object')"
]
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "xphf-zpauTxU"
},
"source": [
"df = df.drop(columns=['ctryname', 'un_region_name', 'ehead', 'regime', 'start_year'])"
],
"execution_count": 23,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "NOuRvPJXuTxZ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "6d0ff229-11c0-4275-b921-a882699fe9d3"
},
"source": [
"df.columns"
],
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['un_continent_name', 'democracy', 'duration', 'observed'], dtype='object')"
]
},
"metadata": {},
"execution_count": 24
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "tzcg35DtuTxd"
},
"source": [
"# df_hazard = pd.get_dummies(df, drop_first=True, columns=df.columns.drop(['duration', 'observed']))\n",
"df_hazard = pd.get_dummies(df, columns=df.columns.drop(['duration', 'observed']))"
],
"execution_count": 25,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "MGxRvM20uTxg",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "79468c5b-bdb9-4ca6-d615-01c97d3841a4"
},
"source": [
"df_hazard.columns"
],
"execution_count": 26,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['duration', 'observed', 'un_continent_name_Africa',\n",
" 'un_continent_name_Americas', 'un_continent_name_Asia',\n",
" 'un_continent_name_Europe', 'un_continent_name_Oceania',\n",
" 'democracy_Democracy', 'democracy_Non-democracy'],\n",
" dtype='object')"
]
},
"metadata": {},
"execution_count": 26
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "i9b7PSd2uTxl"
},
"source": [
"df_hazard = df_hazard.drop(columns=['un_continent_name_Americas', 'democracy_Democracy'])"
],
"execution_count": 27,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "3-O2MTLPuTxo",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "9166057a-9d66-49cb-ee09-32e3b9493f10"
},
"source": [
"df_hazard.columns"
],
"execution_count": 28,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['duration', 'observed', 'un_continent_name_Africa',\n",
" 'un_continent_name_Asia', 'un_continent_name_Europe',\n",
" 'un_continent_name_Oceania', 'democracy_Non-democracy'],\n",
" dtype='object')"
]
},
"metadata": {},
"execution_count": 28
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "IUKzqcFvuTxs",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 679
},
"outputId": "01d2412e-dbab-4851-c419-56931ffd14a9"
},
"source": [
"from lifelines import CoxPHFitter\n",
"\n",
"cph = CoxPHFitter(penalizer=0.1)\n",
"cph.fit(df_hazard, 'duration', 'observed')\n",
"cph.print_summary()"
],
"execution_count": 29,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<lifelines.CoxPHFitter: fitted with 1808 total observations, 340 right-censored observations>\n",
" duration col = 'duration'\n",
" event col = 'observed'\n",
" penalizer = 0.1\n",
" l1 ratio = 0.0\n",
" baseline estimation = breslow\n",
" number of observations = 1808\n",
"number of events observed = 1468\n",
" partial log-likelihood = -9613.12\n",
" time fit was run = 2024-12-04 06:24:28 UTC\n",
"\n",
"---\n",
" coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%\n",
"covariate \n",
"un_continent_name_Africa -0.20 0.82 0.08 -0.36 -0.04 0.70 0.96\n",
"un_continent_name_Asia -0.06 0.95 0.07 -0.20 0.09 0.82 1.09\n",
"un_continent_name_Europe 0.23 1.26 0.06 0.11 0.35 1.12 1.43\n",
"un_continent_name_Oceania -0.12 0.89 0.11 -0.33 0.10 0.72 1.10\n",
"democracy_Non-democracy -0.72 0.49 0.06 -0.84 -0.60 0.43 0.55\n",
"\n",
" cmp to z p -log2(p)\n",
"covariate \n",
"un_continent_name_Africa 0.00 -2.48 0.01 6.25\n",
"un_continent_name_Asia 0.00 -0.77 0.44 1.19\n",
"un_continent_name_Europe 0.00 3.79 <0.005 12.70\n",
"un_continent_name_Oceania 0.00 -1.07 0.29 1.80\n",
"democracy_Non-democracy 0.00 -11.48 <0.005 98.97\n",
"---\n",
"Concordance = 0.62\n",
"Partial AIC = 19236.25\n",
"log-likelihood ratio test = 266.31 on 5 df\n",
"-log2(p) of ll-ratio test = 181.91"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>model</th>\n",
" <td>lifelines.CoxPHFitter</td>\n",
" </tr>\n",
" <tr>\n",
" <th>duration col</th>\n",
" <td>'duration'</td>\n",
" </tr>\n",
" <tr>\n",
" <th>event col</th>\n",
" <td>'observed'</td>\n",
" </tr>\n",
" <tr>\n",
" <th>penalizer</th>\n",
" <td>0.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>l1 ratio</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>baseline estimation</th>\n",
" <td>breslow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>number of observations</th>\n",
" <td>1808</td>\n",
" </tr>\n",
" <tr>\n",
" <th>number of events observed</th>\n",
" <td>1468</td>\n",
" </tr>\n",
" <tr>\n",
" <th>partial log-likelihood</th>\n",
" <td>-9613.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>time fit was run</th>\n",
" <td>2024-12-04 06:24:28 UTC</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div><table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th style=\"min-width: 12px;\"></th>\n",
" <th style=\"min-width: 12px;\">coef</th>\n",
" <th style=\"min-width: 12px;\">exp(coef)</th>\n",
" <th style=\"min-width: 12px;\">se(coef)</th>\n",
" <th style=\"min-width: 12px;\">coef lower 95%</th>\n",
" <th style=\"min-width: 12px;\">coef upper 95%</th>\n",
" <th style=\"min-width: 12px;\">exp(coef) lower 95%</th>\n",
" <th style=\"min-width: 12px;\">exp(coef) upper 95%</th>\n",
" <th style=\"min-width: 12px;\">cmp to</th>\n",
" <th style=\"min-width: 12px;\">z</th>\n",
" <th style=\"min-width: 12px;\">p</th>\n",
" <th style=\"min-width: 12px;\">-log2(p)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>un_continent_name_Africa</th>\n",
" <td>-0.20</td>\n",
" <td>0.82</td>\n",
" <td>0.08</td>\n",
" <td>-0.36</td>\n",
" <td>-0.04</td>\n",
" <td>0.70</td>\n",
" <td>0.96</td>\n",
" <td>0.00</td>\n",
" <td>-2.48</td>\n",
" <td>0.01</td>\n",
" <td>6.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>un_continent_name_Asia</th>\n",
" <td>-0.06</td>\n",
" <td>0.95</td>\n",
" <td>0.07</td>\n",
" <td>-0.20</td>\n",
" <td>0.09</td>\n",
" <td>0.82</td>\n",
" <td>1.09</td>\n",
" <td>0.00</td>\n",
" <td>-0.77</td>\n",
" <td>0.44</td>\n",
" <td>1.19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>un_continent_name_Europe</th>\n",
" <td>0.23</td>\n",
" <td>1.26</td>\n",
" <td>0.06</td>\n",
" <td>0.11</td>\n",
" <td>0.35</td>\n",
" <td>1.12</td>\n",
" <td>1.43</td>\n",
" <td>0.00</td>\n",
" <td>3.79</td>\n",
" <td>&lt;0.005</td>\n",
" <td>12.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>un_continent_name_Oceania</th>\n",
" <td>-0.12</td>\n",
" <td>0.89</td>\n",
" <td>0.11</td>\n",
" <td>-0.33</td>\n",
" <td>0.10</td>\n",
" <td>0.72</td>\n",
" <td>1.10</td>\n",
" <td>0.00</td>\n",
" <td>-1.07</td>\n",
" <td>0.29</td>\n",
" <td>1.80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>democracy_Non-democracy</th>\n",
" <td>-0.72</td>\n",
" <td>0.49</td>\n",
" <td>0.06</td>\n",
" <td>-0.84</td>\n",
" <td>-0.60</td>\n",
" <td>0.43</td>\n",
" <td>0.55</td>\n",
" <td>0.00</td>\n",
" <td>-11.48</td>\n",
" <td>&lt;0.005</td>\n",
" <td>98.97</td>\n",
" </tr>\n",
" </tbody>\n",
"</table><br><div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>Concordance</th>\n",
" <td>0.62</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Partial AIC</th>\n",
" <td>19236.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>log-likelihood ratio test</th>\n",
" <td>266.31 on 5 df</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-log2(p) of ll-ratio test</th>\n",
" <td>181.91</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/latex": "\\begin{tabular}{lrrrrrrrrrrr}\n & coef & exp(coef) & se(coef) & coef lower 95% & coef upper 95% & exp(coef) lower 95% & exp(coef) upper 95% & cmp to & z & p & -log2(p) \\\\\ncovariate & & & & & & & & & & & \\\\\nun_continent_name_Africa & -0.20 & 0.82 & 0.08 & -0.36 & -0.04 & 0.70 & 0.96 & 0.00 & -2.48 & 0.01 & 6.25 \\\\\nun_continent_name_Asia & -0.06 & 0.95 & 0.07 & -0.20 & 0.09 & 0.82 & 1.09 & 0.00 & -0.77 & 0.44 & 1.19 \\\\\nun_continent_name_Europe & 0.23 & 1.26 & 0.06 & 0.11 & 0.35 & 1.12 & 1.43 & 0.00 & 3.79 & 0.00 & 12.70 \\\\\nun_continent_name_Oceania & -0.12 & 0.89 & 0.11 & -0.33 & 0.10 & 0.72 & 1.10 & 0.00 & -1.07 & 0.29 & 1.80 \\\\\ndemocracy_Non-democracy & -0.72 & 0.49 & 0.06 & -0.84 & -0.60 & 0.43 & 0.55 & 0.00 & -11.48 & 0.00 & 98.97 \\\\\n\\end{tabular}\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "iYkJ1coruTxx",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 573
},
"outputId": "52043275-fbb9-4212-e030-6ba39995f831"
},
"source": [
"fig_coef, ax_coef = plt.subplots(figsize=(12,7))\n",
"ax_coef.set_title('Survival Regression: Coefficients and Confident Intervals')\n",
"cph.plot(ax=ax_coef)\n",
"plt.show()"
],
"execution_count": 30,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1200x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "KfaBnER0VoXT"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment