Here's the deal.
- We have sequenced the exomes of an affected child, and their unaffected parents. The child has a rare skin condition.
- We aligned their exome data to the human reference genome.
- We called variants using GATK.
- The resulting VCF file is called
trio.trim.vep.vcf.gz
. - Make a new directory called
raredisease
,cd
into it, and download the VCF into the directory with:
wget https://home.chpc.utah.edu/~u1007787/trio.trim.vep.vcf.gz
- Sample 1805 (genotype offset 0 in the file) is the father
- Sample 1847 (genotype offset 1 in the file) is the mother
- Sample 4805 (genotype offset 2 in the file) is the affected child
- Given the phenotypes of the members of the family, we hypothesize that the child's phenotype is caused by a de novo mutation (DNM).
-
Your challenge is to use the concepts we discussed in this lecture, combined with bcftools, to nominate a causal variant for this child's phenotype.
-
Hints:
- bcftools cheat sheet - useful!
- You need to use the
bcftools view
command with the-i
option to filter variants - You can filter the variants down to a reasonable set by:
-
- requiring each member of the family to have a specific genotype (e.g.,
GT[0]="0/1" && GT[1]="0/1"
requires mom and dad to be homozygous for the reference allele)
- requiring each member of the family to have a specific genotype (e.g.,
-
- requiring each member of the family to have a minimum depth of 10 aligned sequence reads (e.g.,
FORMAT/DP[1]>=10 && FORMAT/DP[2]>=10
requires mom and kid to have at least 10 aligned reads)
- requiring each member of the family to have a minimum depth of 10 aligned sequence reads (e.g.,
-
- requiring the variant to be a missense change
-
- You might care to exclude variants that were labelled as potentially "dodgy" by the variant called (i.e., "PASS")
-
- Your VCF coordinates are with respect to GRCh37 (build 37 of the human reference genome)
-
-
Applying these hints using bcftools should filter the variants down to less than fifteen candidate variants.
-
Using bcftools, your powerful brain, and possible external resources (e.g., gnomad.broadinstitute.org), make your best guess as to the variant and gene that you think causes the kid's phenotype.
-
Report the header and every variant in the file:
bcftools view trio.trim.vep.vcf.gz
-
Report the header and every variant where mom and dad are homozygous for the reference allele:
bcftools view trio.trim.vep.vcf.gz -i 'GT[0]="0/0" && GT[1]="0/0"'
bcftools view trio.trim.vep.vcf.gz -i 'GT[0]="0/0" && GT[1]="0/0" && GT[2]="0/1" && FORMAT/DP[0]>=10 && FORMAT/DP[1]>=10 && FORMAT/DP[2]>=10' | grep missense | grep PASS