Skip to content

Instantly share code, notes, and snippets.

View kescobo's full-sized avatar

Kevin Bonham kescobo

View GitHub Profile
@kescobo
kescobo / Rosalind_question.py
Created April 11, 2014 18:33
Problem from rosalind.info
input = open('rosalind_ini6.txt', 'r')
input = input.read()
words = []
list = {}
for word in input.split(' '):
words.append(word)
for i in xrange(0, len(words)):
import re
#open file to read
with open('test.fasta', 'r') as seqs:
seqFile = (open('throwaway.txt', 'a+'))
for line in seqs:
if re.search('^>[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]', line):
seqFile.close()
@kescobo
kescobo / pseudo_code
Last active August 29, 2015 14:17
HGT analysis pipeline
import annotated genomes
for genome in genomes:
write genome to genome_database
database{
species_name:
CDS{
annotation:
aa sequence:
nt sequence:
@kescobo
kescobo / arthrobacter_genome.gb
Last active January 11, 2016 18:24
Files and code for Biopython issue #747
LOCUS BW77_ACAGTG.R1_(paired)_contig_93 233 bp DNA linear UNK
DEFINITION Contig BW77_ACAGTG.R1_(paired)_contig_93 from Arthrobacter sp.
BW77
ACCESSION unknown
FEATURES Location/Qualifiers
source 1..233
/mol_type="genomic DNA"
/db_xref="taxon: 6666666"
/genome_md5=""
/project="bewolfe_6666666"
@kescobo
kescobo / gbk_to_faa.py
Last active January 14, 2016 21:14
Python definitions to take genbank file containing multiple genomes and dividing them into fasta protein files, one file per genome. For Stackoverflow question http://stackoverflow.com/posts/34777323/
from operator import itemgetter
from itertools import groupby
def gbk_to_faa(some_genbank):
source = None
for record in SeqIO.parse(some_genbank, 'gb'):
if source:
if record.annotations['source'] != source:
out_file.close()
source = sub(r'\W+', "_", sub(r'\W$', "", record.annotations['source']))
@kescobo
kescobo / class_projects.py
Last active March 28, 2018 14:47 — forked from Junuxx/class_projects.py
Solution for a StackOverflow scheduling problem
'''
Created on 26 jan. 2016
@author: Jeroen Kools
Based on the following StackOverflow question:
https://stackoverflow.com/questions/35002027/maximizing-a-combination-of-a-series-of-values#comment57732451_35002027
- 19 students
- 12 dates
@kescobo
kescobo / newjd.jl
Last active March 16, 2017 16:36
Better Jaccard Distance
function jaccarddistance(sketch1::MinHashSketch, sketch2::MinHashSketch)
d = length(setdiff(sketch1.sketch, sketch2.sketch))
l = length(sketch1)
return (l-d) / (l+d)
end
function newjd(sketch1::MinHashSketch, sketch2::MinHashSketch)
matches = 0
sketchlen = length(sketch1)
i = 1
@kescobo
kescobo / newminhash.jl
Created March 18, 2017 16:14
Benchmarks for new implementation of kmerminhash
b1 = open(FASTAReader,
"/Users/ksb/computation/science/genomes/kv_input/fasta/Brachybacterium_alimentarium_738_10.fna")
seq1 = dna""
for s in b1
seq1 = seq1 * s.seq
end
length(seq1) # 4160958
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0-rc1.0 (2017-05-07 00:00 UTC)
_/ |\__'_|_|_|\__'_| |
|__/ | x86_64-apple-darwin16.5.0
julia> using Plots
@kescobo
kescobo / jb418.sam
Created June 8, 2017 20:48
Sam contents
@HD VN:1.0 SO:unsorted
@SQ SN:tig00000001 LN:4311805
@SQ SN:tig00000002 LN:66004
@SQ SN:tig00000004 LN:1036550
@SQ SN:tig00000005 LN:256226
@SQ SN:tig00000007 LN:150530
@SQ SN:tig00000008 LN:138168
@PG ID:bowtie2 PN:bowtie2 VN:2.3.2 CL:"/usr/local/bin/../Cellar/bowtie2/2.3.2/bin/bowtie2-align-s --wrapper basic-0 -x jb418 -S jb418.sam -1 ../../raw_reads/r1.fastq -2 ../../raw_reads/r2.fastq"
HWI-D00742:45:H7W5YBCXX:1:1101:1219:2200 73 tig00000001 3359004 42 100M = 3359004 0 CCTGNGTGACGAAGACCACCTGGGCGACATGGACTTCAAGGTAGCCGGTACCGCCAAAGGTGTTACCGCGCTGCAGATGGACATCAAGATCNAGGGCATC DDDD#<<EHHHIIHIIIIIIIIIHHIIIIIIIIIIIIIIIIIHEHIIIIIIIIIIIIIIIHHIIIHIIIIIIIIHIEHIHIIIIIIIIGGH#<<EHFHIH AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:4G86A8 YT:Z:UP
HWI-D00742:45:H7W5YBCXX:1:1101:1219:2200 133 tig00000001 3359004 0 * = 3359004 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ####################################################################################################