Skip to content

Instantly share code, notes, and snippets.

View trianglegrrl's full-sized avatar

Alaina Hardie trianglegrrl

View GitHub Profile
@ShujiaHuang
ShujiaHuang / gatk_bundle_and_WGS_test_data.sh
Last active October 22, 2024 17:22
Common datasets for GATK
#Known datasets: GATK bundle for human b37 reference
#
wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz.md5
@allenday
allenday / sra_download.yaml
Created April 9, 2017 19:42
SRA download to Google Cloud Storage
name: sra_download
description: use Google Pipeline API to download an SRA run, reformat it as unaligned BAM, and upload it to Google Cloud Storage. Run it like this: gcloud alpha genomics pipelines run --inputs SAMPLE=XXXXX --inputs RUN=XXXXX --outputs OUTPUT_FILE=gs://XXXXX --pipeline-file=sra_download.yaml
resources:
#increase boot disk from 10GB to 50GB to accomodate intermediate files
bootDiskSizeGb: 50
#specify multiple zones so this pipeline will run in parallel
zones:
- us-west1-a
- us-west1-b
- us-east1-b