Ryan Schubert RyanSchu

About

This gist details how to convert illumina final report genotype data to an input useable by PLINK. From there the genotype can be quality controlled and the end results exported to a .vcf file or otherwise parsed.

Lgen/PLINK format

lgen from the PLINK documentation "A text file with no header line, and one line per genotype call (or just not-homozygous-major calls if 'lgen-ref' was invoked) usually with the following five fields:

Family ID

These scripts are meant to be run from the command line. Save the eigenplot.R script attached. You can then run it from the command line as such

Rscript eigenplot.R --val Path/to/pca.eigencal --vec Path/to/pca.eigenvec --o /directory/to/output

This will make the pdf file pca_plots in the specified output directory.

GWAS QC

Each and every gwas is different so there is no perfect standard of how to run GWAS. However many of the steps we must take are predictable. Most commonly, we find ourselves filtering by missingness, removing related individuals and removing snps that are in linkage disequilibrium with each other. Finally, we usually want to run PCA on our population to validate that they have been adequately filtered. This pipeline combines these general steps and more into three coherent scripts to easily perform gwas qc.

Running the Pipeline

The pipeline is broken into three scripts main scripts that performs one of the steps mentioned above. Each has a handful of options and produces multiple output files for the user. I would encourage you to look at the project wiki and get an understanding of what each option does and what files are produced. While the pipeline produces predicatble results, it is largely up to the user to give direction and decide on wh

Install the aspera cli client required by dbgap

aspera installation manual download page

wget https://download.asperasoft.com/download/sw/cli/3.7.7/aspera-cli-3.7.7.608.927cce8-linux-64-release.sh

update permissions of the installation executable and execute

chmod a+x ./aspera-cli-3.7.7.608.927cce8-linux-64-release.sh
./aspera-cli-3.7.7.608.927cce8-linux-64-release.sh

	#origianlly written by Brianne Coffey Github@breecoffey
	#modified by Ryan Schubert

	import argparse
	import gzip
	import re
	from os.path import expanduser
	current = expanduser("~")

	parser = argparse.ArgumentParser(description='Input & Output Files') #create the argument parser