Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Last active January 11, 2017 12:07
Show Gist options
  • Save explodecomputer/617a64134f74b3ee8eaba704b8949284 to your computer and use it in GitHub Desktop.
Save explodecomputer/617a64134f74b3ee8eaba704b8949284 to your computer and use it in GitHub Desktop.
Genetic correlations of phenotypic principal components
---
title: Simulations of estimating genetic correlations on phenotypic principal components
author: Gibran Hemani
date: "`r Sys.Date()`"
output:
pdf_document:
keep_tex: true
---
**To reproduce this analysis, code is available here:** [ https://gist.github.com/617a64134f74b3ee8eaba704b8949284 ]()
To build, run:
```r
rmarkdown::render("rg_pcs.Rmd")
```
```{r, echo=FALSE, cache=FALSE}
library(knitr)
opts_chunk$set(warning=FALSE, message=FALSE, cache=TRUE, echo=TRUE)
```
## Background
1. Calculating principal components on phenotypes will return orthogonal axes of variation, i.e. each component will be uncorrelated with each other component.
2. The PCs are generated agnostic of sources of correlations, so any shared genetic effects that existed within the original phenotypes should be lost between the components.
3. Non-zero genetic correlations could feasibly arise if the shared genetic effects between two components were perfectly counter balanced by shared environmental effects operating in the opposite direction.
These simulations attempt to just check to evaluate that (2) is sane...
## Simulations
The parameters are as follows
```{r }
nsnp <- 1000
nid <- 3000
nphen <- 100
neff <- 100
mean_env_correlation <- 0.1
mean_gen_correlation <- 0.5
```
Simulate SNPs for `r nsnp` SNPs and `r nid` individuals
```{r }
g <- matrix(rbinom(nsnp * nid, 2, 0.5), nid, nsnp)
```
Simulate the environmental correlations between the `r nphen` phenotypes, such that every phenotype has approximately the same environmental correlation `r mean_env_correlation`
```{r }
library(mvtnorm)
rcov <- matrix(mean_env_correlation, nphen, nphen)
diag(rcov) <- 1
residual <- rmvnorm(nid, mean=rep(0, nphen), sigma=rcov)
```
Now simulate the SNP effects for each phenotype. We want the phenotypes to have a mean genetic correlation of `r mean_gen_correlation`. I will make a slightly more complicated simulation - half of the genetic correlations are `r mean_gen_correlation`, and the other half are `r -mean_gen_correlation`.
```{r }
b1 <- matrix(mean_gen_correlation, nphen/2, nphen/2)
b2 <- matrix(-mean_gen_correlation, nphen/2, nphen/2)
reff <- rbind(cbind(b1, b2), cbind(b2, b1))
diag(reff) <- 1
geneff <- rmvnorm(nsnp, mean=rep(0, nphen), sigma=reff)
```
Construct the genetic effects for each phenotype
```{r }
gen <- matrix(0, nid, nphen)
for(i in 1:nphen)
{
gen[,i] <- scale(g %*% geneff[,i])
}
```
Now create the phenotypes from the genetic and environmental effects
```{r }
phen <- gen + residual
```
The average heritability will be 0.5 because the variance of the genetic and environmental factors are the same. The average phenotypic correlation for the phenotype pairs with positive genetic correlations will be $(r_g + r_e) / 2 = `r (mean_env_correlation + mean_gen_correlation) / 2`$. And for the negative: $(-r_g + r_e) / 2 = `r (mean_env_correlation + -mean_gen_correlation) / 2`$. Check that we get this:
```{r }
phen_cor <- cor(phen)
diag(phen_cor) <- NA
hist(phen_cor)
```
## Analyse simulated data
### Basic phenotypes
Perform GWAS of each phenotype
```{r }
gwas <- function(x, g)
{
eff <- matrix(0, ncol(x), ncol(g))
for(i in 1:ncol(x))
{
for(j in 1:ncol(g))
{
eff[i,j] <- cov(x[,i], g[,j]) / var(g[,j])
}
}
return(eff)
}
# Matrix of nphen x nsnp
res_phen <- gwas(phen, g)
```
The empirical genetic correlation can then be obtained. In the graph below there are `r nphen` $\times$ `r nphen` points - one for each pair of phenotypes.
```{r }
gcor_phen <- cor(t(res_phen))
diag(gcor_phen) <- NA
gcor_simulated <- cor(gen)
diag(gcor_simulated) <- NA
plot(abs(gcor_phen) ~ abs(gcor_simulated),
xlab="| Simulated genetic correlations |",
ylab="| Empirical genetic correlations |")
```
### Phenotypic PCs
Now do the same for the principal components of the phenotypes
```{r }
library(ggplot2)
pcs <- princomp(phen)$scores
res_pcs <- gwas(pcs, g)
gcor_pcs <- cor(t(res_pcs))
diag(gcor_pcs) <- NA
hist(gcor_pcs)
```
The genetic correlations of the PCs are basically all 0.
### Phenotypic independent components
What happens if we decompose using ICA instead of PCA? ICA works by decomposing the phenotype matrix $X$ into $S$ is the matrix of independent components and $A$ is a 'mixing matrix', such that $X = SA$. In this way, $S$ can be though of latent variables and $A$ is the linear transformation to the latent variables to recreate the original phenotypes.
You need to choose the number of latent variables for ICA. I will just use 40, but there are methods to estimate the model complexity i.e. Automatic Relevance Determination, see section 5.2 of [ http://www.robots.ox.ac.uk/~sjrob/Pubs/ica_eeg_pos.pdf ](http://www.robots.ox.ac.uk/~sjrob/Pubs/ica_eeg_pos.pdf).
```{r }
library(fastICA)
ica_result <- fastICA(phen, n.comp=40)
ics <- ica_result$S
res_ics <- gwas(ics, g)
gcor_ics <- cor(t(res_ics))
diag(gcor_ics) <- NA
hist(gcor_ics)
```
Heatmaps of the genetic correlations -
```{r }
library(ggplot2)
library(reshape2)
a <- melt(gcor_phen)
a$id <- "Phenotypes"
b <- melt(gcor_pcs)
b$id <- "PCs"
c <- melt(gcor_simulated)
c$id <- "Simulated"
d <- melt(gcor_ics)
d$id <- "ICs"
dat <- rbind(a, b, c, d)
ggplot(dat, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
facet_grid(. ~ id) +
labs(x="Index of phenotypes",
y="Index of phenotypes",
fill="Genetic correlation") +
scale_fill_distiller(palette = "Spectral")
```
## Summary
Neither the ICs nor the PCs of the phenotypes have any genetic correlations. It might be worth tweaking the parameters here to see if they can elucidate something that looks a little bit less expected - i.e. under what conditions could the kind of results from the hip shape analysis be obtained.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment