Skip to content

Instantly share code, notes, and snippets.

View RyanSchu's full-sized avatar

Ryan Schubert RyanSchu

View GitHub Profile
#origianlly written by Brianne Coffey Github@breecoffey
#modified by Ryan Schubert
import argparse
import gzip
import re
from os.path import expanduser
current = expanduser("~")
parser = argparse.ArgumentParser(description='Input & Output Files') #create the argument parser
@RyanSchu
RyanSchu / Illumina to Lgen.md
Last active February 8, 2024 01:38
Convert FinalReport.txt files to plink lgen format

About

This gist details how to convert illumina final report genotype data to an input useable by PLINK. From there the genotype can be quality controlled and the end results exported to a .vcf file or otherwise parsed.

Lgen/PLINK format

lgen from the PLINK documentation "A text file with no header line, and one line per genotype call (or just not-homozygous-major calls if 'lgen-ref' was invoked) usually with the following five fields:

Family ID

@RyanSchu
RyanSchu / PCA.md
Last active September 24, 2018 17:30
PCA w/out hapmap

These scripts are meant to be run from the command line. Save the eigenplot.R script attached. You can then run it from the command line as such

Rscript eigenplot.R --val Path/to/pca.eigencal --vec Path/to/pca.eigenvec --o /directory/to/output

This will make the pdf file pca_plots in the specified output directory.

@RyanSchu
RyanSchu / GwasQCExample.md
Last active September 21, 2018 20:57
GWAS QC PIPELINE EXAMPLE

GWAS QC

Each and every gwas is different so there is no perfect standard of how to run GWAS. However many of the steps we must take are predictable. Most commonly, we find ourselves filtering by missingness, removing related individuals and removing snps that are in linkage disequilibrium with each other. Finally, we usually want to run PCA on our population to validate that they have been adequately filtered. This pipeline combines these general steps and more into three coherent scripts to easily perform gwas qc.

Running the Pipeline

The pipeline is broken into three scripts main scripts that performs one of the steps mentioned above. Each has a handful of options and produces multiple output files for the user. I would encourage you to look at the project wiki and get an understanding of what each option does and what files are produced. While the pipeline produces predicatble results, it is largely up to the user to give direction and decide on wh

@RyanSchu
RyanSchu / dbgap.md
Last active August 21, 2024 04:40
Downloading Data from dbgap

Install the aspera cli client required by dbgap

aspera installation manual download page

wget https://download.asperasoft.com/download/sw/cli/3.7.7/aspera-cli-3.7.7.608.927cce8-linux-64-release.sh

update permissions of the installation executable and execute

chmod a+x ./aspera-cli-3.7.7.608.927cce8-linux-64-release.sh
./aspera-cli-3.7.7.608.927cce8-linux-64-release.sh