Skip to content

Instantly share code, notes, and snippets.

View arq5x's full-sized avatar

Aaron Quinlan arq5x

View GitHub Profile
@arq5x
arq5x / comparo.txt
Created September 11, 2013 03:20
bedtools 2.18 timing comparisons with 2.17 and bedops
######################################################################################################
# Version 2.18.0
######################################################################################################
# BAM, CHROMSWEEP
time ./bedtools intersect -a random.20M.sorted.bam -b gencode.v17.exons.sorted.bed -sorted > /dev/null
real 0m20.153s
user 0m19.953s
sys 0m0.192s
@arq5x
arq5x / rest-example.md
Last active July 23, 2019 00:16
Example use of a RESTful API to GEMINI databases.

Load a GEMINI database from a VCF

$ gemini load -v nobel.vcf -t VEP --cores 23 -p samples.ped nobel.db

Launch the GEMINI web server

(this will run on your local machine on port 8088)

$ gemini browser nobel.db

@arq5x
arq5x / acknowledgements.txt
Last active December 21, 2015 09:59
A collection of both popular and lesser known biographies, autobiographies and histories of scientific figures and their discoveries. Focused primarily on biology and genetics. This is largely a collection of responses to a query on twitter.
Thanks to @Adrian_H, @RoryKirchner, @genetics_blog,
@leonidkrugliak, @Graham_Coop, @yokofakun,
@pathogenomenick, @caseybergman, @Jalfoldi,
@Paul_R_Johnston, @gknoProject, @David_Dobbs,
@vsbuffalo, @robinhenig, @nccomfort,
@GholsonLyon, @notSoJunkDNA, @neilfws,
@rovingpencil, @rtraborn
In addition, please refer to Casey Bergman's collection of other books:
http://www.librarything.com/catalog/xylem&tag=genetics&collection=-1
@arq5x
arq5x / workflow.sh
Last active December 21, 2015 01:58
GBM
export SAMPLES="BLV4 NCH411GBM_CD133high NCH411GBM_CD133low NCH537P54_CD133neg NCH537P54_CD133pos NCH620P55_CD133neg NCH620P55_CD133pos NCH644GBM_CD133high NCH644GBM_CD133low NCH7Md_P43_CD133neg NCH7Md_P43_CD133pos NPC-v"
######################################
# Sort the original BAM files by name:
######################################
export GBMHOME=/net/midtier18/vol79/cphg-quinlan2/projects/gbm-seq-abounader
export STEPNAME=gbm-nmsrt
for sample in `echo $SAMPLES`
do
export QSUB="qsub -q cphg -W group_list=CPHG -V -l select=1:mem=6000m:ncpus=1 -N $STEPNAME -m bea -M [email protected]"
@arq5x
arq5x / shuffle-non-overlapping.sh
Created June 18, 2013 13:20
"Strategy" for generating shuffled BED files while preventing overlapping records in the shuffled output.
tries=0
while true;
do
tries=$((tries+1))
echo "attempt number $tries"
# try a shuffle
bedtools shuffle -i foo.bed -g human.hg19.genome > foo.shuffled.bed
@arq5x
arq5x / gemini-somatic.sh
Last active December 18, 2015 10:29
Example of using GEMINI genotype columns to identify confident somatic mutations in a cancer experiment
# DOCS:
# https://gemini.readthedocs.org/en/latest/content/database_schema.html#genotype-information
# GEMINI SOURCE:
# https://github.com/arq5x/gemini
#########################################################################
# load a VCF for a tumor / normal pair into gemini.
# - use 4 cores
# - assume VCF has been annotated with snpEff
@arq5x
arq5x / test.vcf
Last active December 17, 2015 19:09
CBW 2013 - Structural Variation Practical Session.
##fileformat=VCFv4.1
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
##ALT=<ID=DEL,Description="Deletion">
@arq5x
arq5x / grantham-dict.py
Last active January 30, 2025 06:31
Convert Grantham Amino Acid matrix into Python dict.
#!/usr/bin/env python
import sys
import pprint
def make_grantham_dict(grantham_mat_file):
"""
Citation: http://www.ncbi.nlm.nih.gov/pubmed/4843792
Provenance: http://www.genome.jp/dbget-bin/www_bget?aaindex:GRAR740104
@arq5x
arq5x / test.sh
Last active February 26, 2025 13:17
Compress and then Decompress a string with zlib.
# compile
$ g++ zlib-example.cpp -lz -o zlib-example
# run
$ ./zlib-example
Uncompressed size is: 36
Uncompressed string is: Hello Hello Hello Hello Hello Hello!
----------
@arq5x
arq5x / test_gemini_query.py
Last active December 15, 2015 09:09
Example script using the Gemini query API for custom analysis.
#!/usr/bin/env python
import sys
from gemini import GeminiQuery
db = sys.argv[1]
# create a GeminiQery instance for the requested database
gq = GeminiQuery(db)