Created
December 31, 2015 18:45
-
-
Save explodecomputer/1f22ea9b6d71d31660b4 to your computer and use it in GitHub Desktop.
survival notes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: Notes on survival analysis | |
author: Gibran Hemani | |
date: "`r Sys.Date()`" | |
output: pdf_document | |
bibliography: survival.bib | |
--- | |
These notes are based on [http://data.princeton.edu/wws509/notes](http://data.princeton.edu/wws509/notes). | |
## The survival function | |
$T$ is a continuous random variable with pdf $f(t)$ and cdf $F(t) = P(T < t)$, giving the probability that the event has occurred by duration T. An example is the Gompertz-Makeham law, which has pdf | |
$$ | |
(\alpha e^{\beta x} + \lambda)\cdot \exp(-\lambda x-\frac{\alpha}{\beta}(e^{\beta x}-1)) | |
$$ | |
and CDF | |
$$ | |
1-\exp(-\lambda x-\frac{\alpha}{\beta}(e^{\beta x}-1)) | |
$$ | |
These distributions look like: | |
```{r } | |
gm_pdf <- function(x, a, b, lambda) | |
{ | |
(a * exp(b*x) + lambda) * exp(-lambda * x - a / b * (exp(b*x) - 1)) | |
} | |
gm_cdf <- function(x, a, b, lambda) | |
{ | |
1 - exp( -lambda * x - a / b * (exp(b*x) - 1) ) | |
} | |
age <- 1:100 | |
a <- 7.478359e-05 | |
b <- 8.604875e-02 | |
lambda <- -1.846973e-03 | |
par(mfrow=c(2,1)) | |
plot( | |
x=age, y=gm_pdf(age, a, b, lambda), | |
main="PDF of Gompertz-Makeham" | |
) | |
plot( | |
x=age, y=gm_cdf(age, a, b, lambda), | |
main="CDF of Gompertz-Makeham" | |
) | |
``` | |
The survival function gives the probability of being alive before duration $t$, e.g. the event has not occurred by duration $t$. e.g. | |
$$ | |
S(t) = 1 - F(t) = \int_{t}^{\infty} f(x)dx | |
$$ | |
So $S(x) \leq S(t)$ for all $x > t$. The hazard function is | |
$$ | |
\lambda(t) = \lim_{dt \to 0} \frac{P(t \leq T < t+dt \mid T \geq t)}{dt} | |
$$ | |
which is basically just the rate of occurrance per unit time. For the Gompertz-Makeham this looks like: | |
## The hazard function | |
The hazard function | |
$$ | |
\lambda(t) = \frac{f(t)}{S(t)} | |
$$ | |
e.g. | |
```{r } | |
gm_hz <- function(x, a, b, lambda) | |
{ | |
a * exp(b*x) + lambda | |
} | |
plot( | |
x=age, y=gm_hz(age, a, b, lambda), | |
main="Hazard function of Gompertz-Makeham" | |
) | |
``` | |
relates to the cdf and pdf as | |
```{r } | |
plot( | |
x = gm_hz(age, a, b, lambda), | |
y = gm_pdf(age, a, b, lambda) / (1 - gm_cdf(age, a, b, lambda)) | |
) | |
``` | |
So the rate of occurrance of the event at duration $t$ equals the density of events at $t$ divided by the probability of surviving to that duration without the event happening. | |
Because we know that $S(t) = 1 - F(t) = \int_{t}^{\infty} f(x)dx$, we know that the derivative of $S(t)$ is $-f(t)$. So, we can rewrite the hazard function as | |
$$ | |
\lambda(t) = - \frac{d}{dt}logS(t) | |
$$ | |
This allows us to rewrite $S(t) as a function of surviving all hazards up to $t$ | |
$$ | |
S(t) = exp( - \int_{0}^{t} \lambda(x)dx) | |
$$ | |
The integral part of this is known as the cumulative hazard ($\Delta(t)$). | |
## Explanatory variables | |
Introducing explanatory variables is understood in the context of $T_i$ being the time to event for individual $i$, and modelled as | |
$$ | |
\log(T_i) = \mathbf{{x_i}'\beta} + \epsilon_i | |
$$ | |
where $\epsilon_i$ is a suitable error term, or indeed the baseline value for $T_i$ when there are no explanatory variables. e.g. The explanatory variable shifts the standard baseline value. This can be exponentiated | |
$$ | |
T_i = exp(\mathbf{x_i'\beta})T_{0i} | |
$$ | |
where $T_{0i}$ is the exponentiated error term. We can use $\gamma$ as shorthand for the multiplicative effect $exp(\mathbf{x_i'\beta})$ of the covariates. | |
Interpretation of this is then straightforward - if an explanatory variable $x$ is binary (smokers vs non-smokers) then if smokers live half as long as non-smokers then $\gamma = 0.5$, or $\beta = \log(0.5) =$ `r log(0.5)`. Relating this to the survivor function, the model is interpreted as *life acceleration*. If $S_1(t)$ are smokers and $S_0(t)$ are non-smokers then | |
$$ | |
S_1(t) = S_0(t/\gamma) | |
$$ | |
An alternative approach is proportional hazards [@Cox1972] which focuses on the hazard function directly. | |
## References |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment