Skip to content

Instantly share code, notes, and snippets.

View terrycojones's full-sized avatar

Terry Jones terrycojones

View GitHub Profile
from collections import defaultdict
def confusion(trueLabels, clusterLabels):
counts = defaultdict(lambda: defaultdict(int))
allLabels = sorted(set(trueLabels + clusterLabels))
for trueLabel, clusterLabel in zip(trueLabels, clusterLabels):
counts[trueLabel][clusterLabel] += 1
return allLabels, counts
Subjects (with offsets) by hash:
GB7:GA24:87
gi|285002301|ref|YP_003422365.1| envelope fusion protein [Pseudaletia unipuncta granulovirus] [453]
gi|311977556|ref|YP_003986676.1| hypothetical protein [Acanthamoeba polyphaga mimivirus] [88]
gi|9626461|ref|NP_059434.1| hypothetical protein JEVgp1 [Japanese encephalitis virus] [3118]
gi|20564197|ref|NP_620735.1| replication-associated protein [Tomato pseudo-curly top virus] [228]
gi|448825870|ref|YP_007418801.1| putative ankyrin repeat protein [Megavirus lba] [153]
gi|226377796|ref|YP_002790845.1| hypothetical protein lb338_phage_166 [Lactobacillus phage Lb338-1] [97]
gi|563397262|ref|YP_008857028.1| putative DNA ligase [Pseudomonas phage PAK_P5] [117]
gi|118197773|ref|YP_874166.1| hypothetical protein YS40_153 [Thermus phage phiYS40] [539]
#!/usr/bin/env python
import sys
from collections import defaultdict
seen = defaultdict(list)
for lineNumber, line in enumerate(sys.stdin):
if lineNumber:
fields = line.split('|')
#!/usr/bin/env python
import sys
seen = {}
for lineNumber, line in enumerate(sys.stdin):
if lineNumber:
fields = line.split('|')
key = '|'.join(fields[:2] + fields[3:])
#!/usr/bin/env python
import sys
from collections import defaultdict
seen = defaultdict(list)
for lineNumber, line in enumerate(sys.stdin):
if lineNumber:
fields = line.split('|')
python -m cProfile -o /tmp/new.prof bin/find-matches.py --database light.db --fastaFile query.fa > query.out
#!/usr/bin/env python
import sys
from pstats import Stats
p = Stats(sys.argv[1])
#p.strip_dirs().sort_stats('cumulative').print_stats()
p.sort_stats('cumulative').print_stats()
class ScannedReadDatabase(object):
"""
Maintain a collection of reads and provide for database (index, search)
operations on them.
@param landmarkFinderClasses: A C{list} of landmark classes.
@param trigPointFinderClasses: A C{list} of trig point classes.
"""
def __init__(self, landmarkFinderClasses, trigPointFinderClasses):
self.landmarkFinderClasses = landmarkFinderClasses
Phylogenetic studies indicate that eukaryotic DNA polymerases and some
viral DNA polymerases have a common origin. These studies are not easy to
interpret because only a few of the polymerase domains are conserved and
phylogenetic trees have all been unrooted, so the direction of evolution
cannot be determined. The gene flow between viruses and eukaryotic cells
could have been in either direction. Takemura suggested that
α-polymerases (priming polymerases in eukaryotes) originated from a
pox-like virus. Unlike most other DNA viruses, poxviruses replicate in the
cytoplasm of their hosts, completely independent of the host nucleus.
Vaccinia virus, a well studied poxvirus, encodes protein kinases and
Request timeout for icmp_seq 56
Request timeout for icmp_seq 57
Request timeout for icmp_seq 58
Request timeout for icmp_seq 59
64 bytes from 173.194.45.240: icmp_seq=43 ttl=42 time=17498.977 ms
64 bytes from 173.194.45.240: icmp_seq=44 ttl=42 time=17338.116 ms
64 bytes from 173.194.45.240: icmp_seq=45 ttl=42 time=16603.952 ms
64 bytes from 173.194.45.240: icmp_seq=47 ttl=42 time=15499.101 ms
64 bytes from 173.194.45.240: icmp_seq=48 ttl=42 time=14698.743 ms
64 bytes from 173.194.45.240: icmp_seq=50 ttl=42 time=13174.529 ms
def walkHSP(self, hsp):
"""
Provide information about exactly how a read matches a subject, as
specified by C{hsp}.
@return: A generator that yields (offset, residue, inMatch) tuples.
The offset is the offset into the matched subject. The residue is
the base in the read (which might be '-' to indicate a gap in the
read was aligned with the subject at this offset). inMatch will be
C{True} for residues that are part of the HSP match, and C{False}