-
-
Save johnlees/3e06380965f367e4894ea20fbae2b90d to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
'''Python implementation of Firth regression by John Lees | |
See https://www.ncbi.nlm.nih.gov/pubmed/12758140''' | |
def firth_likelihood(beta, logit): | |
return -(logit.loglike(beta) + 0.5*np.log(np.linalg.det(-logit.hessian(beta)))) | |
# Do firth regression | |
# Note information = -hessian, for some reason available but not implemented in statsmodels | |
def fit_firth(y, X, start_vec=None, step_limit=1000, convergence_limit=0.0001): | |
logit_model = smf.Logit(y, X) | |
if start_vec is None: | |
start_vec = np.zeros(X.shape[1]) | |
beta_iterations = [] | |
beta_iterations.append(start_vec) | |
for i in range(0, step_limit): | |
pi = logit_model.predict(beta_iterations[i]) | |
W = np.diagflat(np.multiply(pi, 1-pi)) | |
var_covar_mat = np.linalg.pinv(-logit_model.hessian(beta_iterations[i])) | |
# build hat matrix | |
rootW = np.sqrt(W) | |
H = np.dot(np.transpose(X), np.transpose(rootW)) | |
H = np.matmul(var_covar_mat, H) | |
H = np.matmul(np.dot(rootW, X), H) | |
# penalised score | |
U = np.matmul(np.transpose(X), y - pi + np.multiply(np.diagonal(H), 0.5 - pi)) | |
new_beta = beta_iterations[i] + np.matmul(var_covar_mat, U) | |
# step halving | |
j = 0 | |
while firth_likelihood(new_beta, logit_model) > firth_likelihood(beta_iterations[i], logit_model): | |
new_beta = beta_iterations[i] + 0.5*(new_beta - beta_iterations[i]) | |
j = j + 1 | |
if (j > step_limit): | |
sys.stderr.write('Firth regression failed\n') | |
return None | |
beta_iterations.append(new_beta) | |
if i > 0 and (np.linalg.norm(beta_iterations[i] - beta_iterations[i-1]) < convergence_limit): | |
break | |
return_fit = None | |
if np.linalg.norm(beta_iterations[i] - beta_iterations[i-1]) >= convergence_limit: | |
sys.stderr.write('Firth regression failed\n') | |
else: | |
# Calculate stats | |
fitll = -firth_likelihood(beta_iterations[-1], logit_model) | |
intercept = beta_iterations[-1][0] | |
beta = beta_iterations[-1][1:].tolist() | |
bse = np.sqrt(np.diagonal(np.linalg.pinv(-logit_model.hessian(beta_iterations[-1])))) | |
return_fit = intercept, beta, bse, fitll | |
return return_fit | |
if __name__ == "__main__": | |
import sys | |
import warnings | |
import math | |
import statsmodels | |
import numpy as np | |
from scipy import stats | |
import statsmodels.api as smf | |
# create X and y here. Make sure X has an intercept term (column of ones) | |
# ... | |
# How to call and calculate p-values | |
(intercept, beta, bse, fitll) = fit_firth(y, X) | |
beta = [intercept] + beta | |
# Wald test | |
waldp = [] | |
for beta_val, bse_val in zip(beta, bse): | |
waldp.append(2 * (1 - stats.norm.cdf(abs(beta_val/bse_val)))) | |
# LRT | |
lrtp = [] | |
for beta_idx, (beta_val, bse_val) in enumerate(zip(beta, bse)): | |
null_X = np.delete(X, beta_idx, axis=1) | |
(null_intercept, null_beta, null_bse, null_fitll) = fit_firth(y, null_X) | |
lrstat = -2*(null_fitll - fitll) | |
lrt_pvalue = 1 | |
if lrstat > 0: # non-convergence | |
lrt_pvalue = stats.chi2.sf(lrstat, 1) | |
lrtp.append(lrt_pvalue) |
- With my version of python (3.7.4) I needed to default start_vec to None in the function declaration.
- Since my data set has 448,92 observations, it ran out of memory allocating array (448926,448926). I see a note above about having the code more memory efficient and hope that there might be a ready-to-hand solution for that.
@extrospective I haven't worked through the linear algebra to see if this is possible, but I would note that as well as H, I think var_covar_mat would be (448926,448926) too, so I'm not sure whether it's possible to eliminate that. You could try memory mapping the array, though whether the whole thing gets read in matmul
negating any savings I'm not sure.
I have added a default arg for state_vec
@josef-pkt Are you able to suggest any code which would implement this?
Hi John,
Big thanks for making this available! I was using sci-kit learn for a project I've just finished and it was useful for me to rewrite your code so that it can interface with the sci-kit learn API. If you or anyone else finds this useful you may find it on my GitHub (https://github.com/LukeLB/LogisticRegression_Firth).
Since I had some issues with (quasi) perfect separations in a small sample while performing simple logistic regression, I wanted to try out this implementation and compare outcomes. However, I do not know how to resolve the "ValueError: Shape of passed values is (70, 1), indices imply (70, 2)" at code line "null_X = np.delete(X, beta_idx, axis=1)" while in fact X.shape = (70, 2) due to adding a constant, so I don't know what happens that causes the passed values to have the shape of (70, 1). Did anyone have issues with this or know how to resolve it?
Minor comment on line "waldp.append(2 * (1 - stats.norm.cdf(abs(beta_val/bse_val)))" as it needed an extra closing round bracket in my case
Since I had some issues with (quasi) perfect separations in a small sample while performing simple logistic regression, I wanted to try out this implementation and compare outcomes. However, I do not know how to resolve the "ValueError: Shape of passed values is (70, 1), indices imply (70, 2)" at code line "null_X = np.delete(X, beta_idx, axis=1)" while in fact X.shape = (70, 2) due to adding a constant, so I don't know what happens that causes the passed values to have the shape of (70, 1). Did anyone have issues with this or know how to resolve it?
I think this was an error I introduced when adding the intercept to the beta vector, hopefully fixed now
Minor comment on line "waldp.append(2 * (1 - stats.norm.cdf(abs(beta_val/bse_val)))" as it needed an extra closing round bracket in my case
Thanks, I've fixed this
@IRambags actually sorry, I'm not sure what's causing the error with fitting the null. What length are beta
and bse
in the loop?
both len(beta) and len(bse) equal 2
Does running np.delete(X, 0, axis=1)
or np.delete(X, 1, axis=1)
work?
Unfortunately not, both give the same ValueError as before. Does it matter that I work in jupyter i.e. iPython?
Does it matter that I work in jupyter i.e. iPython?
I wouldn't have thought so.
Double check the shape of X being passed. You can also remove the necessary column using other indexing techniques if this isn't working for you https://numpy.org/doc/stable/reference/arrays.indexing.html
I indeed now just went for null_X = X.drop('const', 1) and this seems to work. Very strange functioning of the np.delete() though. Thanks for your quick replies and support!
Hi all, I'm working on an sklearn-compatible implementation based on logistf here: https://github.com/jzluo/firthlogist
I would greatly appreciate any feedback on it!
There are a few implementation differences compared to this gist. Most notably, the full hat matrix is not calculated which gives significant memory savings, and penalized likelihood ratio tests are used as in logistf for p-values (some more detail here).
Hi all, I'm working on an sklearn-compatible implementation based on logistf here: https://github.com/jzluo/firthlogist I would greatly appreciate any feedback on it! There are a few implementation differences compared to this gist. Most notably, the full hat matrix is not calculated which gives significant memory savings, and penalized likelihood ratio tests are used as in logistf for p-values (some more detail here).
@jzluo Amazing! Looks like a nice improvement over this code 🙂
Thank you for your code!
I am running your code by I am receiving an error at line:
-> (intercept, beta, bse, fitll) = fit_firth(y, X) with the following message "Unable to coerce to Series, length must be 1: given 11871"
I have included an extract of X: X.shape -> (11871, 4)
is_male age genotype intercept term
0 0 28.92 0 1
1 0 70.95 0 1
2 0 29.92 0 1
.... ... ... ... ...
11869 0 74.95 0
11870 0 73.95 0
and y: y.shape -> (11871, 1)
Experimental Group
0 0
1 0
... ...
11869 1
11870 1
Thank you for your help it is much appreciated
Thanks, that's right. I now realize that the p-value was different between these two calculations because the previous implementation used the intercept se to compute the test statistic for the first beta parameter, which you now fixed.