Skip to content

Instantly share code, notes, and snippets.

@ShujiaHuang
Last active June 26, 2025 09:24
Show Gist options
  • Save ShujiaHuang/56fe45439afc208a3fc2e8cf85803cf8 to your computer and use it in GitHub Desktop.
Save ShujiaHuang/56fe45439afc208a3fc2e8cf85803cf8 to your computer and use it in GitHub Desktop.
Common datasets for GATK
#Known datasets: GATK bundle for human b37 reference
#
wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/hapmap_3.3.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/hapmap_3.3.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/1000G_omni2.5.b37.vcf.gz
wget -c ftp://[email protected]/bundle/b37/1000G_omni2.5.b37.vcf.gz.md5
wget -c ftp://[email protected]/bundle/b37/1000G_phase3_v4_20130502.sites.vcf.gz
wget -c ftp://[email protected]/bundle/b37/1000G_phase3_v4_20130502.sites.vcf.gz.md5
# human b37 reference: 使用的时候需要解压并建(bwa)比对的index
wget -c -O human_g1k_v37.fasta.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
## 创建bwa index
bwa index human_g1k_v37.fasta
# Get test fastq data: NA12878
wget -c -O NA12878_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194147/ERR194147_1.fastq.gz
wget -c -O NA12878_2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194147/ERR194147_2.fastq.gz
## 外显子数据
wget -c -O NA12878-NGv3-LAB1360-A_1.fastq.gz https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-A_1.fastq.gz
wget -c -O NA12878-NGv3-LAB1360-A_2.fastq.gz https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-A_2.fastq.gz
这是外显子的区域文件,在GATK 进行变异检测那一步可以把 -L 的参数换成下面这个文件的区域就行,可以节省很多时间
wget -c -O NGv3.bed.gz https://s3.amazonaws.com/bcbio_nextgen/NGv3.bed.gz
@Zer0day-0
Copy link

many many thanks for listing them! was searching for them the whole day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment