Skip to content

Instantly share code, notes, and snippets.

@MrFlick
Last active October 7, 2023 11:28
Show Gist options
  • Select an option

  • Save MrFlick/ae299d8f3760f02de6bf to your computer and use it in GitHub Desktop.

Select an option

Save MrFlick/ae299d8f3760f02de6bf to your computer and use it in GitHub Desktop.
makeglm.R: Creates a "fake" glm object with specific coefficients that you can use for predicting without fitting a model first
makeglm <- function(formula, ..., family, data=NULL) {
dots <- list(...)
out<-list()
tt <- terms(formula, data=data)
if(!is.null(data)) {
mf <- model.frame(tt, data)
vn <- sapply(attr(tt, "variables")[-1], deparse)
if((yvar <- attr(tt, "response"))>0)
vn <- vn[-yvar]
xlvl <- lapply(data[vn], function(x) if (is.factor(x))
levels(x)
else if (is.character(x))
levels(as.factor(x))
else
NULL)
attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
}
out$terms <- tt
coef <- numeric(0)
stopifnot(length(dots)>1 & !is.null(names(dots)))
for(i in seq_along(dots)) {
if((n<-names(dots)[i]) != "") {
v <- dots[[i]]
if(!is.null(names(v))) {
coef[paste0(n, names(v))] <- v
} else {
stopifnot(length(v)==1)
coef[n] <- v
}
} else {
coef["(Intercept)"] <- dots[[i]]
}
}
out$coefficients <- coef
out$rank <- length(coef)
if (!missing(family)) {
out$family <- if (class(family) == "family") {
family
} else if (class(family) == "function") {
family()
} else if (class(family) == "character") {
get(family)()
} else {
stop(paste("invalid family class:", class(family)))
}
out$qr <- list(pivot=seq_len(out$rank))
out$deviance <- 1
out$null.deviance <- 1
out$aic <- 1
class(out) <- c("glm","lm")
} else {
class(out) <- "lm"
out$fitted.values <- predict(out, newdata=dd)
out$residuals <- out$mf[attr(tt, "response")] - out$fitted.values
out$df.residual <- nrow(data) - out$rank
out$model <- data
#QR doesn't work
}
out
}
set.seed(15)
dd <- data.frame(
X1=runif(50),
X2=factor(sample(letters[1:4], 50, replace=T)),
X3=rpois(50, 5),
Outcome = sample(0:1, 50, replace=T)
)
mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)
predict(mymodel, type="response")
newmodel <- makeglm(Outcome~X1+X2+X3, family=binomial, data=dd,
-.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)
predict(newmodel, newdata=dd, type="response")
@hlin117
Copy link
Copy Markdown

hlin117 commented Jun 26, 2014

Hi. In this script, I'm confused about your construction of the input dd (data) object. It seems that in test.make.glm.R, dd stores a sample size of n = 50 with 3 predictors, but isn't this suggesting we are trying to fit the glm model?

I would imagine that if the model has been already fit, it would suffice to only pass in the four coefficients, beta0, beta1, beta2, and beta3 (represented by -0.5, 1, c(1.5, 1, 1.5), and -0.15). What is the purpose of the data parameter?

@hlin117
Copy link
Copy Markdown

hlin117 commented Jun 26, 2014

Okay, after some trial and error, I understand what the input data is representing. Sorry for the inconvenience.

@dionisio-acosta-ucl
Copy link
Copy Markdown

Excellent work. If you were to use the function for a logistic regression with boolean variables, how would you specify the values in the call? For instance, for a boolean variable Cough, le logistic regression specifies a value of 2 for TRUE (output of summary CoughTRUE 2). Thanks ever so much.

@andreasmeid
Copy link
Copy Markdown

I don't know what went wrong, but I get the following error:

newmodel1 <- makeglm(y~x+z, binomial, data=ds.ipd[,c(2:4)], -.5, x=betas[1,1], z=betas[1,2])
Error in coef["(Intercept)"] <- dots[[i]] :
incompatible types (from closure to double) in subassignment type fix

Can you help me with this issue? Many thanks

Andreas

@blah-crusader
Copy link
Copy Markdown

Hey,

this could be very interesting for something I'm trying to do, namely, calculating the log-likelihood of any GLM on a separate test set. Using this function, I could say: make a glm with the coefficients retrieved from fitting on the training set and then create a new GLM object with these coefficients but data from an independent test set. I believe, however, that this does not work yet in your code.
For instance:

    x1 = rnorm(100, mean = 1000)
    y = rpois(n = 100, lambda = 5)
    logLik(glm(y~x1, family=poisson))
    logLik(makeglm(y~x1, family=poisson, -.5, x1=1))

always returns 1.5 for the logLik, no matter what i choose as coefficients for the intercept and x1. Do you have any tips how to adjust this such that also the logLik() function keeps recalculating properly?

Thanks a lot for any advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment