-
-
Save markdanese/d1ddfdb2f618373c719bdb444ca34be9 to your computer and use it in GitHub Desktop.
# load libraries -------------------------------------------------------------------- | |
library(survival) | |
library(data.table) | |
library(magrittr) | |
library(ggplot2) | |
options(stringsAsFactors = FALSE, scipen = 10) | |
# load data (from package) ---------------------------------------------------------- | |
cancer2 <- | |
copy(cancer) %>% | |
setDT() # dataset stored in the survival package for testing | |
cancer3 <- cancer2[complete.cases(cancer2)] # taking complete cases to avoid NA observations on fitted model | |
# run cox model --------------------------------------------------------------------- | |
x <- coxph(Surv(time, status) ~ age + factor(sex) + factor(ph.ecog), data = cancer3) # cox model | |
# function to generate mean survival by factor -------------------------------------- | |
adj_curves <- function(dt, obj, fac){ | |
if(!is.factor(dt[[fac]])) { | |
dt[[fac]] <- factor(dt[[fac]]) | |
} | |
groups <- levels(dt[[fac]]) | |
pred <- survfit(obj, dt) | |
timepoints <- c(0, pred$time) | |
timenames <- paste0("time", timepoints) | |
adj_surv <- as.data.frame(t(pred$surv)) | |
adj_surv <- cbind(time0 = 1, adj_surv) | |
names(adj_surv) <- timenames | |
dt <- cbind(dt, adj_surv) | |
dt2 <- dt[, lapply(.SD, mean), .SDcols = timenames, keyby = fac] | |
graph <- as.data.table(cbind(timepoints, t(dt2[, -1]))) | |
names(graph) <- c("Time", groups) | |
graph <- melt.data.table(data = graph, id.vars = "Time", variable.name = fac, value.name = "Survival") | |
return(graph) | |
} | |
# run function and plot data -------------------------------------------------------- | |
gr <- adj_curves(cancer3, x, "sex") | |
ggplot(gr, aes(x = Time, y = Survival, color = sex)) + | |
geom_step() + | |
scale_y_continuous(limits = c(0, 1)) + | |
theme_bw() |
Hi Mark,
First of great work on implementing this function. I am a little confused as to what type of adjusted survival curve is being generated by your code (and the survminer version which I believe is similar to yours). From my understanding, the above code predicts survival probabilities based on a cox model and then averages the probabilities for groups of interest. So we'd have two curves, say for sex, representing males and females and their respective survival probabilities. However, using this method would still create different covariate distributions for each sex and not truly "adjusted" for each covariate in the model. I think what Terry Therneau's code on this matter in his "Adjusted Survival Curves" vignette was trying to accomplish was different than what you present. I found a good article that outlined the model-based estimation of adjusted survival curves as:
- A Multivariable Cox (or fully parametric) regression is used for the treatment and the covariates.
- For each subject, the predicted survival probabilities, at the times of interest, are estimated using the multivariable model, assuming each subject is in the experimental treatment group; then the predictions are averaged;
- the same predictions are obtained and averaged assuming each subject is in the control group.
the difference between the averaged predicted probabilities between experimental and control group is an estimate of the adjusted R D(t) for the experimental treatment at the specified times.
The key difference in the above method compared to your code is ensuring that every subject is the same for each covariate distribution except for the "exposure" of interest (i.e. sex in our case). I am not an expert on this issue, so I would appreciate your response and clarifications.
Thanks
fixed problem with ordering of results. changed
by = fac
in line 39 tokeyby = fac