Skip to content

Instantly share code, notes, and snippets.

@MichelNivard
Last active June 25, 2020 06:15
Show Gist options
  • Save MichelNivard/fcb22ab7401d7e9af95a5e1fd46ad5b7 to your computer and use it in GitHub Desktop.
Save MichelNivard/fcb22ab7401d7e9af95a5e1fd46ad5b7 to your computer and use it in GitHub Desktop.

Minor update: Genetic correlations and Genomic Control(GC) in GenomicSEM

This document describes a minor update to genomic SEM that provides the user with the option to control how the LD score intercept is used to apply genomic control to GenomicSEM GWAS and code to get quick initial genetic correlations and the standard errors of the genetic correlation from the ldsc() function.

Better documentation and options for Genomic Control.

Behind the scenes, and poorly documented (there were some comments in the code, that’s it), GenomicSEM was applying Genomic Control. The LD score regression intercept produces an expectation for the mean chi-square statistic under the null. As a chi2 distribution with 1 df has a mean of 1.0, an LDSC intercept greater than 1.0 can be used as an index of inflation of the test statistic attributable to uncontrolled confounding (Bulik Sullivan et al. 2015). Specifically, we estimate the univariate LD score intercept and inflate the SE of the estimated SNP-trait covariance by multiplying the SE by the intercept. This is a conservative procedure, and since the LD score intercept typically grows with growing sample sizes in GWAS, this procedure became increasingly conservative as all of us analyze larger and larger GWASs. We'll from now on inflate the standard error with sqrt(Intercept) and allow the user to change this behavior to the previous (conservative) default, or, turn of Genomic Control entirely (not something we would recommend you do unless you have a solid understanding of the nature of the LD score intercept and have good reason to ignore it).

As an illustration we reran the single factor p-factor model first presented in Grotzinger et al. (2019) and inspect the effect on the QQ-plot, LD score intercept (and its s.e.), mean chi2 mean chi2 of the heterogeneity statistics (Q) and the significance of the SNP h2.

Effective N mean chi2 mean chi2 (Q) Univariate LDSC intercept h2 Z-statistic
No Genomic Control 84,765 2.071 1.206 1.059 26.07
Standard Genomic Control 81,434 1.986 1.196 1.018 26.01
Conservative Genomic Control 78,173 1.906 1.186 0.980 25.89

You can also clearly spot some differences in the QQ-plots with: no, standard and conservative genomic control derived from a re-analysis in the p-factor model:

QQ plots

The user can set the argument GC = in the userGWAS() and commonfactorGWAS() to "none", "standard" (the default from now on) and "conserv" (the previous default). the previous default may produce different results from the previous versions of genomicSEM, because we continually push minor updates.

from now on ldsc() also estimates genetic correlations (and their standard error).

While its preferred, and easy enough, to estimate genetic correlation using a model supplied to the usermodel() function, we estimated a correlation matrix, and a matrix of the standard errors associated with it, in the output of the ldsc() function. set the argument stand= TRUE and ldsc() will produce thet standardized covariance matrix (i.e. the correlation matrix) and the sampling (co)variances of the elements in the correlation matrix. Though we emphasize the most accurate estimated of genetic correlations, and especially their standard errors are obtained by fitting a structural model, like the one below, in GenomicSEM:

cor.model <- ' 
lat1 =~ NA*trait1
lat2 =~ NA*trait2

trait1 ~~ 0*trait1 + 0*trait2
trait2 ~~ 0*trait2

lat1 ~~ 1*lat1
lat2 ~~ 1*lat2

lat1 ~~ cor*lat2
'

This model can be fit with usermodel() and it estimates the correlation between variables trait1 and trait2 as the parameter cor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment