Skip to content

Instantly share code, notes, and snippets.

@LeeBergstrand
Created August 21, 2016 20:43
Show Gist options
  • Save LeeBergstrand/b36cb06d773d344b06a83e55e25feec1 to your computer and use it in GitHub Desktop.
Save LeeBergstrand/b36cb06d773d344b06a83e55e25feec1 to your computer and use it in GitHub Desktop.
Shell script for selecting fastq files by number of seqs.
#!/usr/bin/env bash
if [ $# -eq 0 ]
then
echo "No arguments supplied..."
echo "Please provide a minimum number of seqs per file."
exit 1
fi
MIN_LENGTH=$1
FILE_LIST=$(find . -type f -name "*.fastq" | sort)
declare -a BAD_FILE_LIST
echo -e "\nThe following files have passed QC (They will be copied with '.out'):"
for FILE in $FILE_LIST
do
SEQ_COUNT=$(grep '^+' $FILE | wc -l)
SEQ_DATA="$FILE $SEQ_COUNT"
if (("$SEQ_COUNT" < "$MIN_LENGTH"))
then
BAD_FILE_LIST+=("$SEQ_DATA")
else
echo "$SEQ_DATA"
cp "$FILE" "$FILE.out"
fi
done
echo -e "\nThe following files did not pass QC:"
for FILE in "${BAD_FILE_LIST[@]}"
do
echo "$FILE"
done
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment