Skip to content

Instantly share code, notes, and snippets.

@mtw
Last active February 7, 2017 22:56
Show Gist options
  • Save mtw/7989028 to your computer and use it in GitHub Desktop.
Save mtw/7989028 to your computer and use it in GitHub Desktop.
Raw NGS data is often shipped as unaligned BAM files. This shell script converts raw reads to FASTQ format and trims Illumina adapters with cutadapt. Quality Control is performed with FastQC.
#!/bin/bash
cutadapt=`which cutadapt`
fastqc=`which fastqc`
bam2fastq=`which bam2fastq`
gzip=`which gzip`
origdir="."
results="cutadapt"
fastqcdir="${results}/FastQC"
adapter5="CTACACTCTTTCCCTACACGACGCTCTTCCGATCT"
adapter3="GATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
inprefix="C"
outprefix="D"
samples=6
froms=5
replicates=3
threads=4
if ! [ -d "$results" ];
then
mkdir -p $results
fi
if ! [ -d "$fastqcdir" ];
then
mkdir -p $fastqcdir
fi
for s in $(seq $froms $samples)
do
for r in $(seq 1 $replicates)
do
sample=${inprefix}${s}_R${r}
outsample=${outprefix}${s}_R${r}
echo "processing $sample"
# process original BAM file; convert to FASTQ and do QC
$bam2fastq -q ${sample}.bam -o ${origdir}/${sample}\#.fastq
for i in 1 2
do
$fastqc --quiet --noextract -o $fastqcdir -t $threads ${origdir}/${sample}_${i}.fastq && $gzip -c ${origdir}/${sample}_${i}.fastq > ${origdir}/${sample}_${i}.fq.gz &
done
# end procesing original files
rd1="${origdir}/${sample}_1.fastq"
rd2="${origdir}/${sample}_2.fastq"
outrd1="${results}/${outsample}_1.fastq"
outrd2="${results}/${outsample}_2.fastq"
echo "infiles $rd1 && $rd2"
echo "outfiles $outrd1 && $outrd2"
if ! [ -f "$rd1" ];
then
echo "ERROR: $rd1 not available"
fi
if ! [ -f "$rd2" ];
then
echo "ERROR: $rd2 not available"
fi
set -x
# run cutadapt in paired-end mode
$cutadapt -q 30 -m 25 -g $adapter5 -a $adapter3 --paired-output tmp.2.fastq -o tmp.1.fastq $rd1 $rd2 1> cutadapt_${sample}.run1.out 2> cutadapt_${sample}.run1.err
$cutadapt -q 30 -m 25 -g $adapter5 -a $adapter3 --paired-output $outrd1 -o $outrd2 tmp.2.fastq tmp.1.fastq 1> cutadapt_${sample}.run2.out 2> cutadapt_${sample}.run2.err
# remove temp files
rm tmp.1.fastq tmp.2.fastq
# remove original (non-adapter-clipped) FASTQ files
rm $rd1 $rd2
# run FastQC and gzip output
$fastqc --quiet --noextract -o $fastqcdir -t $threads $outrd1 && $gzip $outrd1 &
$fastqc --quiet --noextract -o $fastqcdir -t $threads $outrd2 && $gzip $outrd2 &
# gzip output
#$gzip $outrd1 &
#$gzip $outrd2 &
set +x
done
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment