Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Last active August 15, 2016 11:07
Show Gist options
  • Save explodecomputer/093379f1f92ed092aad0 to your computer and use it in GitHub Desktop.
Save explodecomputer/093379f1f92ed092aad0 to your computer and use it in GitHub Desktop.
Zaitlen method
#!/bin/bash
# name of binary plink filename (excluding .bed/.bim/.fam suffix)
plinkfile=""
# name of phenotype file in plink format (i.e. col1=fid, col2=iid, col3=phenotype
phenfile=""
# make up a name for the grm for all individuals
allfile=""
# make up a name for the grm for related individuals
relatedsfile=""
# maximum relatedness threshold
threshold=0.05
# make up a name for output file (it will have .hsq appended to the end by gcta)
outfile=""
gcta64 \
--bfile ${plinkfile} \
--make-grm \
--maf 0.01 \
--out ${allfile}
gcta64 \
--grm ${allfile} \
--pca 10 \
--out ${allfile}
R --no-save --args ${allfile} ${relatedsfile} ${threshold} < remove_unrelateds.R
echo -e "${allfile}\n${relatedsfile}" > mgrm.txt
gcta64 \
--mgrm mgrm.txt \
--pheno ${phenfile} \
--qcovar ${allfile}.eigenvec \
--reml \
--reml-no-lrt \
--out ${outfile}
## Alternative way to have multiple phenotypes
# phenotype_col="2"
# gcta64 \
# --mgrm mgrm.txt \
# --pheno ${phenfile} \
# --mpheno ${phenotype_col} \
# --qcovar ${allfile}.eigenvec \
# --reml \
# --reml-no-lrt \
# --out ${outfile}
#' Read binary GRM files into R
#'
#' @param rootname
#' @export
#' @return List of GRM and id data frames
readGRM <- function(rootname)
{
bin.file.name <- paste(rootname, ".grm.bin", sep="")
n.file.name <- paste(rootname, ".grm.N.bin", sep="")
id.file.name <- paste(rootname, ".grm.id", sep="")
cat("Reading IDs\n")
id <- read.table(id.file.name)
n <- dim(id)[1]
cat("Reading GRM\n")
bin.file <- file(bin.file.name, "rb")
grm <- readBin(bin.file, n=n*(n+1)/2, what=numeric(0), size=4)
close(bin.file)
cat("Reading N\n")
n.file <- file(n.file.name, "rb")
N <- readBin(n.file, n=n*(n+1)/2, what=numeric(0), size=4)
close(n.file)
cat("Creating data frame\n")
l <- list()
for(i in 1:n)
{
l[[i]] <- 1:i
}
col1 <- rep(1:n, 1:n)
col2 <- unlist(l)
grm <- data.frame(id1=col1, id2=col2, N=N, grm=grm)
ret <- list()
ret$grm <- grm
ret$id <- id
return(ret)
}
#' Write readGRM style output back to binary GRM for use with GCTA
#'
#' @param grm Output from \link{readGRM}
#' @param rootname
#' @export
writeGRM <- function(grm, rootname)
{
bin.file.name <- paste(rootname, ".grm.bin", sep="")
n.file.name <- paste(rootname, ".grm.N.bin", sep="")
id.file.name <- paste(rootname, ".grm.id", sep="")
write.table(grm$id, id.file.name, row=F, col=F, qu=F)
n <- dim(grm$id)[1]
bin.file <- file(bin.file.name, "wb")
writeBin(grm$grm$grm, bin.file, size=4)
close(bin.file)
n.file <- file(n.file.name, "wb")
writeBin(grm$grm$N, n.file, size=4)
close(n.file)
}
setUnrelsZero <- function(grm, threshold)
{
index <- grm$grm$grm < threshold
grm$grm$grm[index] <- 0
return(grm)
}
arguments <- commandArgs(T)
infile <- arguments[1]
outfile <- arguments[2]
threshold <- as.numeric(arguments[3])
grm <- readGRM(infile)
grm <- setUnrelsZero(grm, threshold)
writeGRM(grm, outfile)
@explodecomputer
Copy link
Author

The script will do the following

  1. Construct a genetic relationship matrix using all autosomal SNPs with maf > 0.01 and all individuals.
  2. Calculate the first 10 principal components
  3. Take the GRM from 1 and make a new GRM that is very similar to the original GRM except it sets all unrelated (relationship < threshold) to 0. Have a look at the remove_unrelateds.R file to see how this is done
  4. Estimates genetic variance due to all SNPs (component 1) and remaining variance that can be explained by relatedness (this is a mixture of genetic variance that wasn't captured by SNPs and common environment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment