Last active
October 22, 2024 17:22
-
-
Save ShujiaHuang/56fe45439afc208a3fc2e8cf85803cf8 to your computer and use it in GitHub Desktop.
Common datasets for GATK
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Known datasets: GATK bundle for human b37 reference | |
# | |
wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz.md5 | |
wget -c ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.md5 | |
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz.md5 | |
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz.md5 | |
wget -c ftp://[email protected]/bundle/b37/hapmap_3.3.b37.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/hapmap_3.3.b37.vcf.gz.md5 | |
wget -c ftp://[email protected]/bundle/b37/1000G_omni2.5.b37.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/1000G_omni2.5.b37.vcf.gz.md5 | |
wget -c ftp://[email protected]/bundle/b37/1000G_phase3_v4_20130502.sites.vcf.gz | |
wget -c ftp://[email protected]/bundle/b37/1000G_phase3_v4_20130502.sites.vcf.gz.md5 | |
# human b37 reference: 使用的时候需要解压并建(bwa)比对的index | |
wget -c -O human_g1k_v37.fasta.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz | |
## 创建bwa index | |
bwa index human_g1k_v37.fasta | |
# Get test fastq data: NA12878 | |
wget -c -O NA12878_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194147/ERR194147_1.fastq.gz | |
wget -c -O NA12878_2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194147/ERR194147_2.fastq.gz | |
## 外显子数据 | |
wget -c -O NA12878-NGv3-LAB1360-A_1.fastq.gz https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-A_1.fastq.gz | |
wget -c -O NA12878-NGv3-LAB1360-A_2.fastq.gz https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-A_2.fastq.gz | |
这是外显子的区域文件,在GATK 进行变异检测那一步可以把 -L 的参数换成下面这个文件的区域就行,可以节省很多时间 | |
wget -c -O NGv3.bed.gz https://s3.amazonaws.com/bcbio_nextgen/NGv3.bed.gz | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment