Skip to content

Instantly share code, notes, and snippets.

@iracooke
Last active January 23, 2017 21:57
Show Gist options
  • Select an option

  • Save iracooke/03416b7e5671ffc8f6dad1264cc05da8 to your computer and use it in GitHub Desktop.

Select an option

Save iracooke/03416b7e5671ffc8f6dad1264cc05da8 to your computer and use it in GitHub Desktop.
Quickly Check Sequencing Data

Perform a very quick of sequencing data on a large number of files

Generate a range of standard quality metrics with fastqc

module load FastQC
mkdir fastqc
fastqc *.fastq.gz -o fastqc

This produces full html reports for all sequences but it's challenging to manually inspect these. Use this to quickly check for poor quality sequences.

for f in *.html; do pc=`grep 'Sequences flagged as poor quality' $f | sed -E  's/.*Sequences flagged as poor quality.*([0-9]+).*td.*/\1/'`; printf "%s %d\n" $f $pc;done

Check reads for species composition using Kraken.

make_kraken_report(){ 
  f=$1;
  outfile=${f%.fastq.gz}_krakenrep.txt
  printf "Processing %s into %s\n" $f $outfile
  kraken --fastq-input --gzip-compressed $f | kraken-report > ${f%.fastq.gz}_krakenrep.txt
}
export -f make_kraken_report
parallel -j 24 make_kraken_report {} ::: `ls *.fastq.gz`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment