explodecomputer’s gists

explodecomputer / covid_collider_prediction.rmd

Last active May 28, 2020 22:53

Predictions when training and testing subsets are selected based on a collider

	---
	title: Prediction when testing and training are stratified by a collider
	---

	## Background

	In Menni et al 2020 they look for risk factors that associate with testing positive. They then create a model to predict test status in untested individuals using those risk factors.

	The risk factors, and testing positive, both influence whether individuals are tested. Therefore, the associations in the tested sample are likely biased due to colliders, and not transportable to those in the untested sample.

explodecomputer / mr_dgp.rmd

Last active March 18, 2020 18:45

Data generating process underlying causal inference using Mendelian randomization

	---
	title: Data generating process underlying causal inference using Mendelian randomization
	author: Gibran Hemani
	date: '`r format(Sys.Date())`'
	---


	## Background

	Causal inference between two traits, the exposure's ($x$) effect on the outcome ($y$) can be made using associations of genetic variants $g$ on $x$ and $y$. This method is known as Mendelian randomization (MR), a special case of instrumental variable (IV) analysis where the instrument is a genetic variant. Assume the following causal structure:

explodecomputer / google_drive_image.md

Last active November 7, 2019 10:59

Embed image from google drive

explodecomputer / tetrachoric.r

Created September 19, 2019 15:08

tetrachoric under different scenarios

explodecomputer / hwe.r

Created August 20, 2019 16:34

Merged SNP HWE

	maf <- function(x) { sum(x) / (length(x) * 2)}

	hwe <- function(x) {
	observed <- table(x)
	m <- maf(x)
	expected <- c(
	(1-m)^2, 2 * m * (1-m), m^2
	) * length(x)
	chisq.test(rbind(observed, round(expected)))
	print(rbind(observed, round(expected)))

explodecomputer / infant_weight_height.r

Created July 18, 2019 13:53

Infant weight and later height

	library(ggplot2)
	library(tidyr)
	library(alspac)
	data(current)
	vars <- c("cf040", "cf041", "cf042", "cf043", "fh3000")
	b <- extractVars(subset(current, name %in% vars))

	b1 <- subset(b, select=c("aln", "qlet", vars)) %>% filter(!apply(., 1, function(x) any(is.na(x))))

	for(i in vars)

explodecomputer / raiss_setup.sh

Last active February 4, 2019 08:28

raiss_setup

	for i in {1..22}
	do
	plink --bfile /mnt/storage/private/mrcieu/research/mr-eve/vcf-reference-datasets/1000g_filtered/data_maf0.01_rs --chr $i --make-bed --out ld/ref/data_$i
	done

	for i in {1..22}
	do
	bcftools query -r $i -f '%ID\t%POS\t%REF\t%ALT\t%B\t%SE\n' ../data.bcf \| awk 'BEGIN {print "rsID\tpos\tA0\tA1\tZ"; OFS="\t"}; { print $1, $2, $3, $4, $5/$6}' > ld/ref/z_2_$i.txt
	done

explodecomputer / comparing_packages.r

Last active December 5, 2018 17:54

comparing_packages

	library(TwoSampleMR)
	library(RadialMR)
	library(MendelianRandomization)

	# Read in example data
	a <- extract_instruments(2)
	b <- extract_outcome_data(a$SNP, 7)
	ab <- harmonise_data(a,b)
	ab$SNP <- as.character(ab$SNP)
	# IVW in TwoSampleMR package

explodecomputer / 1kg_ref.sh

Created November 25, 2018 14:36

	#!/bin/bash

	# Download 1000 genomes

	wget ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
	wget ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
	wget ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
	wget ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
	wget ALL.chr11.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
	wget ALL.chr11.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi

explodecomputer / gwas.sh

Last active December 18, 2020 12:35

	#===================================#
	# GWAS Practical #
	# iBSC Genomic Medicine #
	# Lavinia Paternoster #
	# 08/11/2018 #
	#===================================#


	# Run the GWAS without PCs but using the cleaned genetic data

gibran hemani explodecomputer