Skip to content

Instantly share code, notes, and snippets.

@zachcp
Created November 26, 2013 16:44
Show Gist options
  • Save zachcp/7661651 to your computer and use it in GitHub Desktop.
Save zachcp/7661651 to your computer and use it in GitHub Desktop.
Quick Histograms of Fasta File lengths Using biopython+pandas
import glob
from Bio import SeqIO
import pandas as pd
files = glob.glob('./*fasta')
def get_size_frequencies(fasta):
with open(fasta, 'r') as f:
lengths = [len(rec.seq) for rec in SeqIO.parse(f,'fasta')]
lengths = pd.Series(lengths)
return lengths.hist()
graphs = [get_size_frequencies(file) for file in files]
graphs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment