Grab 5 bams and their indexes from 1000G to represent our alignments.
mkdir data && cd data
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00097/alignment/HG00097.chrom20.SOLID.bfast.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00097/alignment/HG00097.chrom20.SOLID.bfast.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00182/alignment/HG00182.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00182/alignment/HG00182.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00183/alignment/HG00183.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00183/alignment/HG00183.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam.bai
echo done
We see from the header that 1000G uses GRCh37. Using ggd search
we can find our reference:
ggd search -g GRCh37 reference genome
Among the listings, we see the reference we need with install instructions:
----------------------------------------------------------------------------------------------------
grch37-reference-genome-1000g-v1
================================
Summary: GRCh37 reference genome from 1000 genomes
Species: Homo_sapiens
Genome Build: GRCh37
Keywords: ref, reference, fasta-file
Data Version: phase2_reference
Prefix Install WARNING: This package has not been set up to use the --prefix flag when running ggd install. Once installed, this package will work with other ggd tools that use the --prefix flag.
To install run:
ggd install grch37-reference-genome-1000g-v1
----------------------------------------------------------------------------------------------------
We install our reference:
ggd install grch37-reference-genome-1000g-v1
# activate environmental variables
source activate base
Now we have everything to QC our reads using a Nextflow workflow for seqcover
.
GENES="MYL9,TLDC2,NNAT,ADIG,FAM83D,PTPRT,SGK2,HNF4A"
nextflow run brwnj/seqcover-nf -revision main -profile docker \
--reference $ggd_grch37_reference_genome_1000g_v1_file \
--crams 'data/*.bam' \
--genes $GENES --hg19
The Nextflow output gives:
N E X T F L O W ~ version 20.10.0
Launching `brwnj/seqcover-nf` [pedantic_jennings] - revision: 8bba84f42f [main]
executor > local (7)
[bb/2fddf4] process > mosdepth (3) [100%] 5 of 5 ✔
[72/2ea7b3] process > seqcover_background [100%] 1 of 1 ✔
[e5/8d2432] process > seqcover_report [100%] 1 of 1 ✔
Completed at: 25-Nov-2020 16:30:28
Duration : 12m 50s
CPU hours : 0.6
Succeeded : 7
And we have our results in ./results/seqcover_report.html
.