Created
April 23, 2013 03:36
-
-
Save indapa/5440671 to your computer and use it in GitHub Desktop.
Use itertools.groupby to read a fastq file
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def yieldFastqRecord (fh): | |
""" a generator that yields a tuple of (fastq_readname, sequence, qualstring) | |
adapted from this http://www.biostars.org/p/67246/#67556 | |
yields a tuple with (header_name,sequence) | |
See http://freshfoo.com/blog/itertools_groupby """ | |
fqiter=(x[1] for x in itertools.groupby(fh, lambda line: line[0] == '@')) | |
#fqiter is made of sub-iterators | |
#the first sub-iter is the header | |
for header in fqiter: | |
readname=header.next().strip() | |
#then the next sub-iters are sequence, '+', and qual | |
#we concat them into a single string, then split them by '+' | |
(sequence,quals)="".join(s.strip() for s in fqiter.next()).split("+") | |
#finally we yield | |
yield readname,sequence,quals | |
for header,seq,qual in yieldFastqRecord(fh): | |
print header | |
print seq | |
print qual | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A qual string may contain '@'. This will cause the program trouble.