Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Last active October 26, 2015 18:01
Show Gist options
  • Save explodecomputer/bdb923c6c267b0a40552 to your computer and use it in GitHub Desktop.
Save explodecomputer/bdb923c6c267b0a40552 to your computer and use it in GitHub Desktop.
Estimate allele frequency of reference allele from GWAS summary stats
## ---- functions ----
getAlleleFreq <- function(n, ahat, Vp, pval)
{
stopifnot(ahat != 0)
Fval <- qf(pval, 1, n-1, low=F)
p <- 0.5 + c(1,-1) * -2 * sqrt(ahat^2 - 2 * Fval * Vp / (n - 1 - Fval)) / (4 * ahat)
return(p[ifelse(sign(ahat) == 1, 1, 2)])
}
## ---- simulation ----
n <- 10000
g <- rbinom(n, 2, 0.2)
a <- -0.05
y <- rnorm(n) + a*g
## ---- stats ----
mod <- coefficients(summary(lm(y ~ g)))
ahat <- mod[2,1]
pval <- mod[2,4]
Vp <- var(y)
---
title: "Estimating allele frequency from GWAS summary stats"
author: "Gibran Hemani"
date: "5 February 2015"
output: pdf_document
---
```{r echo=FALSE}
read_chunk("analysis.R")
```
Assume we know:
- Effect size
- Sample size
- Trait variance
- p-value
The additive variance in terms of effect size and allele frequency is
$$V_A = 2p(1-p)a^2$$
Rearrange in terms of $p$ requires solving the quadratic equation
$$0 = -2a^2p^2 + 2a^2p - V_A$$
giving
$$p = \frac{-2a^2 \pm \sqrt{4a^4 - 8a^2V_A}} {-4a^2}$$
which simplifies to
$$p = 0.5 \pm -2 \frac{\sqrt{a^2 - 2V_A}} {4a}$$
Next we need to convert Substitute $V_A$ in to scale free terms of $F_{val}$
$$F_{val} = \frac{V_A} {\frac{V_P - V_A}{n - 1}}$$
which rearranges to give
$$V_A = \frac{F_{val}V_P}{n - 1 + F_{val}}$$
Substitute this into the quadratic equation and you have the frequency of the effect allele in terms of effect size, sample size, trait variance and p-value
$$p = \frac{1}{2} \pm \frac{-2 \sqrt{a^2 - \frac{2F_{val}V_P}{n - 1 + F_{val}}}} {4a}$$
Finally, the two solutions are the allele frequencies of the reference and non-reference allele if the effect size is positive, or reversed if the effect size is negative.
Here is the R function to perform this calculation:
```{r allele_freq, eval=TRUE}
```
and here is an example simulation. Simulate some parameters:
```{r simulation, eval=TRUE}
```
Acquire the necessary stats:
```{r stats, eval=TRUE}
```
Run the function:
```{r, eval=TRUE}
getAlleleFreq(n, ahat, Vp, pval)
```
Test against true allele frequency
```{r, eval=TRUE}
mean(g) / 2
```
...close enough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment