Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Created December 31, 2015 18:45
Show Gist options
  • Save explodecomputer/1f22ea9b6d71d31660b4 to your computer and use it in GitHub Desktop.
Save explodecomputer/1f22ea9b6d71d31660b4 to your computer and use it in GitHub Desktop.
survival notes
---
title: Notes on survival analysis
author: Gibran Hemani
date: "`r Sys.Date()`"
output: pdf_document
bibliography: survival.bib
---
These notes are based on [http://data.princeton.edu/wws509/notes](http://data.princeton.edu/wws509/notes).
## The survival function
$T$ is a continuous random variable with pdf $f(t)$ and cdf $F(t) = P(T < t)$, giving the probability that the event has occurred by duration T. An example is the Gompertz-Makeham law, which has pdf
$$
(\alpha e^{\beta x} + \lambda)\cdot \exp(-\lambda x-\frac{\alpha}{\beta}(e^{\beta x}-1))
$$
and CDF
$$
1-\exp(-\lambda x-\frac{\alpha}{\beta}(e^{\beta x}-1))
$$
These distributions look like:
```{r }
gm_pdf <- function(x, a, b, lambda)
{
(a * exp(b*x) + lambda) * exp(-lambda * x - a / b * (exp(b*x) - 1))
}
gm_cdf <- function(x, a, b, lambda)
{
1 - exp( -lambda * x - a / b * (exp(b*x) - 1) )
}
age <- 1:100
a <- 7.478359e-05
b <- 8.604875e-02
lambda <- -1.846973e-03
par(mfrow=c(2,1))
plot(
x=age, y=gm_pdf(age, a, b, lambda),
main="PDF of Gompertz-Makeham"
)
plot(
x=age, y=gm_cdf(age, a, b, lambda),
main="CDF of Gompertz-Makeham"
)
```
The survival function gives the probability of being alive before duration $t$, e.g. the event has not occurred by duration $t$. e.g.
$$
S(t) = 1 - F(t) = \int_{t}^{\infty} f(x)dx
$$
So $S(x) \leq S(t)$ for all $x > t$. The hazard function is
$$
\lambda(t) = \lim_{dt \to 0} \frac{P(t \leq T < t+dt \mid T \geq t)}{dt}
$$
which is basically just the rate of occurrance per unit time. For the Gompertz-Makeham this looks like:
## The hazard function
The hazard function
$$
\lambda(t) = \frac{f(t)}{S(t)}
$$
e.g.
```{r }
gm_hz <- function(x, a, b, lambda)
{
a * exp(b*x) + lambda
}
plot(
x=age, y=gm_hz(age, a, b, lambda),
main="Hazard function of Gompertz-Makeham"
)
```
relates to the cdf and pdf as
```{r }
plot(
x = gm_hz(age, a, b, lambda),
y = gm_pdf(age, a, b, lambda) / (1 - gm_cdf(age, a, b, lambda))
)
```
So the rate of occurrance of the event at duration $t$ equals the density of events at $t$ divided by the probability of surviving to that duration without the event happening.
Because we know that $S(t) = 1 - F(t) = \int_{t}^{\infty} f(x)dx$, we know that the derivative of $S(t)$ is $-f(t)$. So, we can rewrite the hazard function as
$$
\lambda(t) = - \frac{d}{dt}logS(t)
$$
This allows us to rewrite $S(t) as a function of surviving all hazards up to $t$
$$
S(t) = exp( - \int_{0}^{t} \lambda(x)dx)
$$
The integral part of this is known as the cumulative hazard ($\Delta(t)$).
## Explanatory variables
Introducing explanatory variables is understood in the context of $T_i$ being the time to event for individual $i$, and modelled as
$$
\log(T_i) = \mathbf{{x_i}'\beta} + \epsilon_i
$$
where $\epsilon_i$ is a suitable error term, or indeed the baseline value for $T_i$ when there are no explanatory variables. e.g. The explanatory variable shifts the standard baseline value. This can be exponentiated
$$
T_i = exp(\mathbf{x_i'\beta})T_{0i}
$$
where $T_{0i}$ is the exponentiated error term. We can use $\gamma$ as shorthand for the multiplicative effect $exp(\mathbf{x_i'\beta})$ of the covariates.
Interpretation of this is then straightforward - if an explanatory variable $x$ is binary (smokers vs non-smokers) then if smokers live half as long as non-smokers then $\gamma = 0.5$, or $\beta = \log(0.5) =$ `r log(0.5)`. Relating this to the survivor function, the model is interpreted as *life acceleration*. If $S_1(t)$ are smokers and $S_0(t)$ are non-smokers then
$$
S_1(t) = S_0(t/\gamma)
$$
An alternative approach is proportional hazards [@Cox1972] which focuses on the hazard function directly.
## References
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment