Last active
October 26, 2015 18:01
-
-
Save explodecomputer/bdb923c6c267b0a40552 to your computer and use it in GitHub Desktop.
Estimate allele frequency of reference allele from GWAS summary stats
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## ---- functions ---- | |
getAlleleFreq <- function(n, ahat, Vp, pval) | |
{ | |
stopifnot(ahat != 0) | |
Fval <- qf(pval, 1, n-1, low=F) | |
p <- 0.5 + c(1,-1) * -2 * sqrt(ahat^2 - 2 * Fval * Vp / (n - 1 - Fval)) / (4 * ahat) | |
return(p[ifelse(sign(ahat) == 1, 1, 2)]) | |
} | |
## ---- simulation ---- | |
n <- 10000 | |
g <- rbinom(n, 2, 0.2) | |
a <- -0.05 | |
y <- rnorm(n) + a*g | |
## ---- stats ---- | |
mod <- coefficients(summary(lm(y ~ g))) | |
ahat <- mod[2,1] | |
pval <- mod[2,4] | |
Vp <- var(y) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Estimating allele frequency from GWAS summary stats" | |
author: "Gibran Hemani" | |
date: "5 February 2015" | |
output: pdf_document | |
--- | |
```{r echo=FALSE} | |
read_chunk("analysis.R") | |
``` | |
Assume we know: | |
- Effect size | |
- Sample size | |
- Trait variance | |
- p-value | |
The additive variance in terms of effect size and allele frequency is | |
$$V_A = 2p(1-p)a^2$$ | |
Rearrange in terms of $p$ requires solving the quadratic equation | |
$$0 = -2a^2p^2 + 2a^2p - V_A$$ | |
giving | |
$$p = \frac{-2a^2 \pm \sqrt{4a^4 - 8a^2V_A}} {-4a^2}$$ | |
which simplifies to | |
$$p = 0.5 \pm -2 \frac{\sqrt{a^2 - 2V_A}} {4a}$$ | |
Next we need to convert Substitute $V_A$ in to scale free terms of $F_{val}$ | |
$$F_{val} = \frac{V_A} {\frac{V_P - V_A}{n - 1}}$$ | |
which rearranges to give | |
$$V_A = \frac{F_{val}V_P}{n - 1 + F_{val}}$$ | |
Substitute this into the quadratic equation and you have the frequency of the effect allele in terms of effect size, sample size, trait variance and p-value | |
$$p = \frac{1}{2} \pm \frac{-2 \sqrt{a^2 - \frac{2F_{val}V_P}{n - 1 + F_{val}}}} {4a}$$ | |
Finally, the two solutions are the allele frequencies of the reference and non-reference allele if the effect size is positive, or reversed if the effect size is negative. | |
Here is the R function to perform this calculation: | |
```{r allele_freq, eval=TRUE} | |
``` | |
and here is an example simulation. Simulate some parameters: | |
```{r simulation, eval=TRUE} | |
``` | |
Acquire the necessary stats: | |
```{r stats, eval=TRUE} | |
``` | |
Run the function: | |
```{r, eval=TRUE} | |
getAlleleFreq(n, ahat, Vp, pval) | |
``` | |
Test against true allele frequency | |
```{r, eval=TRUE} | |
mean(g) / 2 | |
``` | |
...close enough. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment