Here's the deal.
- We have sequenced the exomes of an affected child, and their unaffected parents. The child has a rare skin condition.
- We aligned their exome data to the human reference genome.
- We called variants using GATK.
- The resulting VCF file is called
. - Make a new directory called
into it, and download the VCF into the directory with:
- Sample 1805 (genotype offset 0 in the file) is the father
- Sample 1847 (genotype offset 1 in the file) is the mother
- Sample 4805 (genotype offset 2 in the file) is the affected child
- Given the phenotypes of the members of the family, we hypothesize that the child's phenotype is caused by a de novo mutation (DNM).
Your challenge is to use the concepts we discussed in this lecture, combined with bcftools, to nominate a causal variant for this child's phenotype.
- bcftools cheat sheet - useful!
- You need to use the
bcftools view
command with the-i
option to filter variants - You can filter the variants down to a reasonable set by:
- requiring each member of the family to have a specific genotype (e.g.,
GT[0]="0/1" && GT[1]="0/1"
requires mom and dad to be homozygous for the reference allele)
- requiring each member of the family to have a specific genotype (e.g.,
- requiring each member of the family to have a minimum depth of 10 aligned sequence reads (e.g.,
FORMAT/DP[1]>=10 && FORMAT/DP[2]>=10
requires mom and kid to have at least 10 aligned reads)
- requiring each member of the family to have a minimum depth of 10 aligned sequence reads (e.g.,
- requiring the variant to be a missense change
- You might care to exclude variants that were labelled as potentially "dodgy" by the variant called (i.e., "PASS")
- Your VCF coordinates are with respect to GRCh37 (build 37 of the human reference genome)
Applying these hints using bcftools should filter the variants down to less than fifteen candidate variants.
Using bcftools, your powerful brain, and possible external resources (e.g.,, make your best guess as to the variant and gene that you think causes the kid's phenotype.
Report the header and every variant in the file:
bcftools view trio.trim.vep.vcf.gz
Report the header and every variant where mom and dad are homozygous for the reference allele:
bcftools view trio.trim.vep.vcf.gz -i 'GT[0]="0/0" && GT[1]="0/0"'
bcftools view trio.trim.vep.vcf.gz -i 'GT[0]="0/0" && GT[1]="0/0" && GT[2]="0/1" && FORMAT/DP[0]>=10 && FORMAT/DP[1]>=10 && FORMAT/DP[2]>=10' | grep missense | grep PASS