Skip to content

Instantly share code, notes, and snippets.

@indapa
Created April 23, 2013 03:36
Show Gist options
  • Save indapa/5440671 to your computer and use it in GitHub Desktop.
Save indapa/5440671 to your computer and use it in GitHub Desktop.
Use itertools.groupby to read a fastq file
def yieldFastqRecord (fh):
""" a generator that yields a tuple of (fastq_readname, sequence, qualstring)
adapted from this http://www.biostars.org/p/67246/#67556
yields a tuple with (header_name,sequence)
See http://freshfoo.com/blog/itertools_groupby """
fqiter=(x[1] for x in itertools.groupby(fh, lambda line: line[0] == '@'))
#fqiter is made of sub-iterators
#the first sub-iter is the header
for header in fqiter:
readname=header.next().strip()
#then the next sub-iters are sequence, '+', and qual
#we concat them into a single string, then split them by '+'
(sequence,quals)="".join(s.strip() for s in fqiter.next()).split("+")
#finally we yield
yield readname,sequence,quals
for header,seq,qual in yieldFastqRecord(fh):
print header
print seq
print qual
@zhangzhen
Copy link

A qual string may contain '@'. This will cause the program trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment