Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
explodecomputer / covid_collider_prediction.rmd
Last active May 28, 2020 22:53
Predictions when training and testing subsets are selected based on a collider
---
title: Prediction when testing and training are stratified by a collider
---
## Background
In Menni et al 2020 they look for risk factors that associate with testing positive. They then create a model to predict test status in untested individuals using those risk factors.
The risk factors, and testing positive, both influence whether individuals are tested. Therefore, the associations in the tested sample are likely biased due to colliders, and not transportable to those in the untested sample.
@explodecomputer
explodecomputer / mr_dgp.rmd
Last active March 18, 2020 18:45
Data generating process underlying causal inference using Mendelian randomization
---
title: Data generating process underlying causal inference using Mendelian randomization
author: Gibran Hemani
date: '`r format(Sys.Date())`'
---
## Background
Causal inference between two traits, the exposure's ($x$) effect on the outcome ($y$) can be made using associations of genetic variants $g$ on $x$ and $y$. This method is known as Mendelian randomization (MR), a special case of instrumental variable (IV) analysis where the instrument is a genetic variant. Assume the following causal structure:
@explodecomputer
explodecomputer / google_drive_image.md
Last active November 7, 2019 10:59
Embed image from google drive

name

name

@explodecomputer
explodecomputer / tetrachoric.r
Created September 19, 2019 15:08
tetrachoric under different scenarios
library(mvtnorm)
library(psych)
n <- 100000
dn <- rmvnorm(n, c(0,0), matrix(c(1,0.5, 0.5,1), 2))
# 50% prevalence
d <- dn
d[,1] <- rbinom(n, 1, pnorm(d[,1]))
@explodecomputer
explodecomputer / hwe.r
Created August 20, 2019 16:34
Merged SNP HWE
maf <- function(x) { sum(x) / (length(x) * 2)}
hwe <- function(x) {
observed <- table(x)
m <- maf(x)
expected <- c(
(1-m)^2, 2 * m * (1-m), m^2
) * length(x)
chisq.test(rbind(observed, round(expected)))
print(rbind(observed, round(expected)))
@explodecomputer
explodecomputer / infant_weight_height.r
Created July 18, 2019 13:53
Infant weight and later height
library(ggplot2)
library(tidyr)
library(alspac)
data(current)
vars <- c("cf040", "cf041", "cf042", "cf043", "fh3000")
b <- extractVars(subset(current, name %in% vars))
b1 <- subset(b, select=c("aln", "qlet", vars)) %>% filter(!apply(., 1, function(x) any(is.na(x))))
for(i in vars)
@explodecomputer
explodecomputer / raiss_setup.sh
Last active February 4, 2019 08:28
raiss_setup
for i in {1..22}
do
plink --bfile /mnt/storage/private/mrcieu/research/mr-eve/vcf-reference-datasets/1000g_filtered/data_maf0.01_rs --chr $i --make-bed --out ld/ref/data_$i
done
for i in {1..22}
do
bcftools query -r $i -f '%ID\t%POS\t%REF\t%ALT\t%B\t%SE\n' ../data.bcf | awk 'BEGIN {print "rsID\tpos\tA0\tA1\tZ"; OFS="\t"}; { print $1, $2, $3, $4, $5/$6}' > ld/ref/z_2_$i.txt
done
@explodecomputer
explodecomputer / comparing_packages.r
Last active December 5, 2018 17:54
comparing_packages
library(TwoSampleMR)
library(RadialMR)
library(MendelianRandomization)
# Read in example data
a <- extract_instruments(2)
b <- extract_outcome_data(a$SNP, 7)
ab <- harmonise_data(a,b)
ab$SNP <- as.character(ab$SNP)
# IVW in TwoSampleMR package
#!/bin/bash
# Download 1000 genomes
wget ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
wget ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
wget ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
wget ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
wget ALL.chr11.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
wget ALL.chr11.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
#===================================#
# GWAS Practical #
# iBSC Genomic Medicine #
# Lavinia Paternoster #
# 08/11/2018 #
#===================================#
# Run the GWAS without PCs but using the cleaned genetic data