Alaina Hardie trianglegrrl

Broad spectrum nerd. AI/ML, bioinformatics, robotics, etc.

ShujiaHuang / gatk_bundle_and_WGS_test_data.sh

Last active June 26, 2025 09:24

Common datasets for GATK

	#Known datasets: GATK bundle for human b37 reference
	#
	wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz.md5
	wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz
	wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
	wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.md5
	wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz
	wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz.md5
	wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
	wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz.md5

allenday / sra_download.yaml

Created April 9, 2017 19:42

SRA download to Google Cloud Storage

	name: sra_download
	description: use Google Pipeline API to download an SRA run, reformat it as unaligned BAM, and upload it to Google Cloud Storage. Run it like this: gcloud alpha genomics pipelines run --inputs SAMPLE=XXXXX --inputs RUN=XXXXX --outputs OUTPUT_FILE=gs://XXXXX --pipeline-file=sra_download.yaml
	resources:
	#increase boot disk from 10GB to 50GB to accomodate intermediate files
	bootDiskSizeGb: 50
	#specify multiple zones so this pipeline will run in parallel
	zones:
	- us-west1-a
	- us-west1-b
	- us-east1-b