Created
February 13, 2016 03:32
-
-
Save izmailoff/ced426e31c63aa25a0e3 to your computer and use it in GitHub Desktop.
Solution for https://courses.edx.org/courses/course-v1:ColumbiaX+DS102X+1T2016 - You are given the following five 14-long reads below. Map them to the sequence of the gene responsible for the ABO blood type , keeping in mind that each read might include a single nucleotide error. Report their respective starting positions along the gene (answers…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// not optimized, just a quick way to code it up... | |
// data from: http://www.ncbi.nlm.nih.gov/nuccore/LC068776.1 | |
val s1 = "ccggcctcgggaag" | |
val s2 = "ttgcggacgctagc" | |
val s3 = "tcgggctccccccg" | |
val s4 = "ggggggaaggcgga" | |
val s5 = "tctgtccccccccg" | |
val g = "ggccgcctcccgcgcccctctgtcccctcccgtgttcggcctcgggaagtcggggcggcgggcggcgcgggccgggaggggtcgcctcgggctcaccccgccccagggccgccgggcggaaggcggaggccgagaccagacgcggagccatggccgaggtgttgcggacgctggccg" | |
// compares 2 strings and finds how many chars are different: | |
def similarity(source: String, dest: String): Int = | |
source.zip(dest).foldLeft(0){ case (sum, (x,y)) => if(x == y) sum else sum + 1 } | |
// returns a closest match (Int, Int) with position (0 based) and how many chars were different: | |
def pos(read: String, gene: String): (Int, Int) = { | |
gene.sliding(read.size, 1).zipWithIndex.map{ case (snip, pos) => pos -> similarity(snip, read) }.minBy(_._2) | |
} | |
// find position: | |
scala> pos(s1, g) | |
res25: (Int, Int) = (35,1) | |
// verify: | |
scala> g.substring(35, 35 + s1.size) | |
res32: String = tcggcctcgggaag | |
scala> s1 | |
res33: String = ccggcctcgggaag | |
I am not able to run the python code it give multiple errors
Can you help, please?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#in python :
s1 = "ccggcctcgggaag"
s2 = "ttgcggacgctagc"
s3 = "tcgggctccccccg"
s4 = "ggggggaaggcgga"
s5 = "tctgtccccccccg"
g = "ggccgcctcccgcgcccctctgtcccctcccgtgttcggcctcgggaagtcggggcggcgggcggcgcgggccgggaggggtcgcctcgggctcaccccgccccagggccgccgggcggaaggcggaggccgagaccagacgcggagccatggccgaggtgttgcggacgctggccg"
#turn G in diccionary
while(i+14<=len(g)):
dicc[i]=g[i:i+14]
i+=1
def lcs(X, Y, m, n):
if m == 0 or n == 0:
return 0
elif X[m-1] == Y[n-1]:
return 1 + lcs(X, Y, m-1, n-1)
else:
return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
def search_pos(str1,dicci):
for i in range(163):
if (lcs(str1,dicci[i],len(str1),len(dicci[i])) == 13):
return (i+1)
print(search_post(s1,dicci))