Skip to content

Instantly share code, notes, and snippets.

@peterk87
Created May 1, 2013 19:02
Show Gist options
  • Save peterk87/5497514 to your computer and use it in GitHub Desktop.
Save peterk87/5497514 to your computer and use it in GitHub Desktop.
Python: Chunk up a nucleotide or amino acid sequence and return a list of tuples containing the chunk sequence name and the chunk sequence.
def chunk_seq(seq_name, sequence, chunk_size, chunk_increment):
"""
Chunk up a sequence and return a list of tuples with the chunked up
sequences and new sequence names with the position of the chunk in the
original sequence.
Args:
seq_name: Sequence name.
sequence: Nucleotide or amino acid sequence that is to be chunked up.
chunk_size: Size of chunks (e.g. 30 bp)
chunk_increment: Increment size for generating sequence chunks.
Returns:
A list of tuples with the sequence name plus the start and end indices
of the chunk delimited by underscores and the chunk sequence.
"""
index = 0
chunks = []
while (index < len(sequence)):
chunk_seq = sequence[index:index+chunk_size]
if len(chunk_seq) != chunk_size:
break
chunks.append(('_'.join([seq_name, str(index), str(index+chunk_size)]), chunk_seq))
index += chunk_increment
return chunks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment