Last active
January 11, 2017 12:07
-
-
Save explodecomputer/617a64134f74b3ee8eaba704b8949284 to your computer and use it in GitHub Desktop.
Genetic correlations of phenotypic principal components
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: Simulations of estimating genetic correlations on phenotypic principal components | |
author: Gibran Hemani | |
date: "`r Sys.Date()`" | |
output: | |
pdf_document: | |
keep_tex: true | |
--- | |
**To reproduce this analysis, code is available here:** [ https://gist.github.com/617a64134f74b3ee8eaba704b8949284 ]() | |
To build, run: | |
```r | |
rmarkdown::render("rg_pcs.Rmd") | |
``` | |
```{r, echo=FALSE, cache=FALSE} | |
library(knitr) | |
opts_chunk$set(warning=FALSE, message=FALSE, cache=TRUE, echo=TRUE) | |
``` | |
## Background | |
1. Calculating principal components on phenotypes will return orthogonal axes of variation, i.e. each component will be uncorrelated with each other component. | |
2. The PCs are generated agnostic of sources of correlations, so any shared genetic effects that existed within the original phenotypes should be lost between the components. | |
3. Non-zero genetic correlations could feasibly arise if the shared genetic effects between two components were perfectly counter balanced by shared environmental effects operating in the opposite direction. | |
These simulations attempt to just check to evaluate that (2) is sane... | |
## Simulations | |
The parameters are as follows | |
```{r } | |
nsnp <- 1000 | |
nid <- 3000 | |
nphen <- 100 | |
neff <- 100 | |
mean_env_correlation <- 0.1 | |
mean_gen_correlation <- 0.5 | |
``` | |
Simulate SNPs for `r nsnp` SNPs and `r nid` individuals | |
```{r } | |
g <- matrix(rbinom(nsnp * nid, 2, 0.5), nid, nsnp) | |
``` | |
Simulate the environmental correlations between the `r nphen` phenotypes, such that every phenotype has approximately the same environmental correlation `r mean_env_correlation` | |
```{r } | |
library(mvtnorm) | |
rcov <- matrix(mean_env_correlation, nphen, nphen) | |
diag(rcov) <- 1 | |
residual <- rmvnorm(nid, mean=rep(0, nphen), sigma=rcov) | |
``` | |
Now simulate the SNP effects for each phenotype. We want the phenotypes to have a mean genetic correlation of `r mean_gen_correlation`. I will make a slightly more complicated simulation - half of the genetic correlations are `r mean_gen_correlation`, and the other half are `r -mean_gen_correlation`. | |
```{r } | |
b1 <- matrix(mean_gen_correlation, nphen/2, nphen/2) | |
b2 <- matrix(-mean_gen_correlation, nphen/2, nphen/2) | |
reff <- rbind(cbind(b1, b2), cbind(b2, b1)) | |
diag(reff) <- 1 | |
geneff <- rmvnorm(nsnp, mean=rep(0, nphen), sigma=reff) | |
``` | |
Construct the genetic effects for each phenotype | |
```{r } | |
gen <- matrix(0, nid, nphen) | |
for(i in 1:nphen) | |
{ | |
gen[,i] <- scale(g %*% geneff[,i]) | |
} | |
``` | |
Now create the phenotypes from the genetic and environmental effects | |
```{r } | |
phen <- gen + residual | |
``` | |
The average heritability will be 0.5 because the variance of the genetic and environmental factors are the same. The average phenotypic correlation for the phenotype pairs with positive genetic correlations will be $(r_g + r_e) / 2 = `r (mean_env_correlation + mean_gen_correlation) / 2`$. And for the negative: $(-r_g + r_e) / 2 = `r (mean_env_correlation + -mean_gen_correlation) / 2`$. Check that we get this: | |
```{r } | |
phen_cor <- cor(phen) | |
diag(phen_cor) <- NA | |
hist(phen_cor) | |
``` | |
## Analyse simulated data | |
### Basic phenotypes | |
Perform GWAS of each phenotype | |
```{r } | |
gwas <- function(x, g) | |
{ | |
eff <- matrix(0, ncol(x), ncol(g)) | |
for(i in 1:ncol(x)) | |
{ | |
for(j in 1:ncol(g)) | |
{ | |
eff[i,j] <- cov(x[,i], g[,j]) / var(g[,j]) | |
} | |
} | |
return(eff) | |
} | |
# Matrix of nphen x nsnp | |
res_phen <- gwas(phen, g) | |
``` | |
The empirical genetic correlation can then be obtained. In the graph below there are `r nphen` $\times$ `r nphen` points - one for each pair of phenotypes. | |
```{r } | |
gcor_phen <- cor(t(res_phen)) | |
diag(gcor_phen) <- NA | |
gcor_simulated <- cor(gen) | |
diag(gcor_simulated) <- NA | |
plot(abs(gcor_phen) ~ abs(gcor_simulated), | |
xlab="| Simulated genetic correlations |", | |
ylab="| Empirical genetic correlations |") | |
``` | |
### Phenotypic PCs | |
Now do the same for the principal components of the phenotypes | |
```{r } | |
library(ggplot2) | |
pcs <- princomp(phen)$scores | |
res_pcs <- gwas(pcs, g) | |
gcor_pcs <- cor(t(res_pcs)) | |
diag(gcor_pcs) <- NA | |
hist(gcor_pcs) | |
``` | |
The genetic correlations of the PCs are basically all 0. | |
### Phenotypic independent components | |
What happens if we decompose using ICA instead of PCA? ICA works by decomposing the phenotype matrix $X$ into $S$ is the matrix of independent components and $A$ is a 'mixing matrix', such that $X = SA$. In this way, $S$ can be though of latent variables and $A$ is the linear transformation to the latent variables to recreate the original phenotypes. | |
You need to choose the number of latent variables for ICA. I will just use 40, but there are methods to estimate the model complexity i.e. Automatic Relevance Determination, see section 5.2 of [ http://www.robots.ox.ac.uk/~sjrob/Pubs/ica_eeg_pos.pdf ](http://www.robots.ox.ac.uk/~sjrob/Pubs/ica_eeg_pos.pdf). | |
```{r } | |
library(fastICA) | |
ica_result <- fastICA(phen, n.comp=40) | |
ics <- ica_result$S | |
res_ics <- gwas(ics, g) | |
gcor_ics <- cor(t(res_ics)) | |
diag(gcor_ics) <- NA | |
hist(gcor_ics) | |
``` | |
Heatmaps of the genetic correlations - | |
```{r } | |
library(ggplot2) | |
library(reshape2) | |
a <- melt(gcor_phen) | |
a$id <- "Phenotypes" | |
b <- melt(gcor_pcs) | |
b$id <- "PCs" | |
c <- melt(gcor_simulated) | |
c$id <- "Simulated" | |
d <- melt(gcor_ics) | |
d$id <- "ICs" | |
dat <- rbind(a, b, c, d) | |
ggplot(dat, aes(x=Var1, y=Var2)) + | |
geom_tile(aes(fill=value)) + | |
facet_grid(. ~ id) + | |
labs(x="Index of phenotypes", | |
y="Index of phenotypes", | |
fill="Genetic correlation") + | |
scale_fill_distiller(palette = "Spectral") | |
``` | |
## Summary | |
Neither the ICs nor the PCs of the phenotypes have any genetic correlations. It might be worth tweaking the parameters here to see if they can elucidate something that looks a little bit less expected - i.e. under what conditions could the kind of results from the hip shape analysis be obtained. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment