Skip to content

Instantly share code, notes, and snippets.

@CamDavidsonPilon
Created June 20, 2015 17:38
Show Gist options
  • Save CamDavidsonPilon/e7e5bcc4fdde6722cea2 to your computer and use it in GitHub Desktop.
Save CamDavidsonPilon/e7e5bcc4fdde6722cea2 to your computer and use it in GitHub Desktop.
Lifelines categorical variables.
from patsy import dmatrix
from lifelines import CoxPHFitter
import pandas as pd
df = pd.read_csv('/Users/camerondavidson-pilon/Downloads/prostate1.csv')
X = dmatrix('age + hg + sz + sg + rx + pf + status1 + dtime', df, return_type='dataframe')
print X.head()
"""
Notice patsy has removed the redundant variables: `0.2 mg estrogen` and `in bed < 50% daytime`. This is what R does too.
Patsy has introduced an Intercept column, though. We don't want this.
"""
del X['Intercept']
cp = CoxPHFitter(normalize=False)
cp.fit(X, 'dtime', event_col='status1')
cp.print_summary() # values close to R.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment