Last active
March 22, 2017 15:51
-
-
Save bsmith89/02a6ee811b191829f87556b21b2b5482 to your computer and use it in GitHub Desktop.
A boiled-down version of the commands I use to remove mouse sequence from my Illumina HiSeq reads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
# Remove pairs which hit the mouse reference. | |
# | |
# I haven't tested this script, but you should be able to use the commands in it | |
# to do your contaminant filtering. | |
# It would probably be best to feed in fully trimmed reads. Otherwise you | |
# might want to use "local" mapping. | |
# See: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#end-to-end-alignment-versus-local-alignment | |
# Adapter trim and quality trim your HiSeq reads | |
# Download your reference genome (in this case it's compressed using gzip) | |
wget -P mouse.fn.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/vertebrate_mammalian/Mus_musculus/latest_assembly_versions/GCA_001632575.1_C3H_HeJ_v1/GCA_001632575.1_C3H_HeJ_v1_genomic.fna.gz | |
# Unzip the reference. My reference is in FASTA format | |
gunzip -c mouse.fn.gz > mouse.fn | |
# I use the ".fn" extension to mean nucleotide FASTA | |
bowtie2-build mouse.fn mouse | |
# This produces mouse.1.bt2 mouse.2.bt2 mouse.3.bt2 mouse.4.bt2 mouse.rev.1.bt2 mouse.rev.2.bt2 | |
# Assuming your adapter-trimmed, quality-trimmed sequence data is in a compressed FASTQ format | |
# (one per end): reads.r1.trim.fastq.gz reads.r2.trim.fastq.gz | |
# Map each of the paired-end reads (in fastq format) to the mouse reference | |
bowtie2 -x mouse -q -1 reads.r1.trim.fastq.gz -2 reads.r2.trim.fastq.gz > reads.trim.sam | |
# Use the samtools tool called "fastq" which converts SAM files to FASTQ format. | |
# Only output reads if they match the flag "12" which means that | |
# neither the focal read, nor its paired end mapped to the reference. | |
# See "FLAGS" on http://www.htslib.org/doc/samtools.html | |
samtools fastq reads.trim.sam -f12 -n -1 reads.trim.r1.fastq -2 reads.trim.r2.fastq | |
# To avoid producing what is probably a VERY large intermediate SAM file (and, conveniently, | |
# compressing the output files), you could combine the previous two commands into: | |
# | |
# bowtie2 -x mouse -q -1 reads.r1.trim.fastq.gz -2 reads.r2.trim.fastq.gz \ | |
# | samtools fastq - -f12 -n \ | |
# -1 >(gzip -c > seq/split/$*.r1.trim.filt.fastq.gz) \ | |
# -2 >(gzip -c > seq/split/$*.r2.trim.filt.fastq.gz) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment