Skip to content

Instantly share code, notes, and snippets.

@radaniba
Created November 29, 2012 17:14
Show Gist options
  • Select an option

  • Save radaniba/4170480 to your computer and use it in GitHub Desktop.

Select an option

Save radaniba/4170480 to your computer and use it in GitHub Desktop.
Extract all entries from a FASTQ file with names not present in a FASTA file
from Bio.SeqIO.QualityIO import FastqGeneralIterator
corrected_fn = "my_input_fasta.fas"
uncorrected_fn = "my_input_fastq.ftq"
output_fn = "differences_fastq.ftq"
corrected_names = set() # Use a set instead of a list
for line in open(corrected_fn):
if line[0] == ">":
read_name = line.split()[0][1:]
corrected_names.add(read_name) # Add value to set
# corrected_names.sort() # No need, a set is orderless and optimized for search
handle = open(output_fn, "w")
for title, seq, qual in FastqGeneralIterator(open(input_fastq_fn)) :
if title not in corrected_names:
handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
handle.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment