Skip to content

Instantly share code, notes, and snippets.

@izmailoff
Created February 13, 2016 03:32
Show Gist options
  • Save izmailoff/ced426e31c63aa25a0e3 to your computer and use it in GitHub Desktop.
Save izmailoff/ced426e31c63aa25a0e3 to your computer and use it in GitHub Desktop.
Solution for https://courses.edx.org/courses/course-v1:ColumbiaX+DS102X+1T2016 - You are given the following five 14-long reads below. Map them to the sequence of the gene responsible for the ABO blood type , keeping in mind that each read might include a single nucleotide error. Report their respective starting positions along the gene (answers…
// not optimized, just a quick way to code it up...
// data from: http://www.ncbi.nlm.nih.gov/nuccore/LC068776.1
val s1 = "ccggcctcgggaag"
val s2 = "ttgcggacgctagc"
val s3 = "tcgggctccccccg"
val s4 = "ggggggaaggcgga"
val s5 = "tctgtccccccccg"
val g = "ggccgcctcccgcgcccctctgtcccctcccgtgttcggcctcgggaagtcggggcggcgggcggcgcgggccgggaggggtcgcctcgggctcaccccgccccagggccgccgggcggaaggcggaggccgagaccagacgcggagccatggccgaggtgttgcggacgctggccg"
// compares 2 strings and finds how many chars are different:
def similarity(source: String, dest: String): Int =
source.zip(dest).foldLeft(0){ case (sum, (x,y)) => if(x == y) sum else sum + 1 }
// returns a closest match (Int, Int) with position (0 based) and how many chars were different:
def pos(read: String, gene: String): (Int, Int) = {
gene.sliding(read.size, 1).zipWithIndex.map{ case (snip, pos) => pos -> similarity(snip, read) }.minBy(_._2)
}
// find position:
scala> pos(s1, g)
res25: (Int, Int) = (35,1)
// verify:
scala> g.substring(35, 35 + s1.size)
res32: String = tcggcctcgggaag
scala> s1
res33: String = ccggcctcgggaag
@bluedespite
Copy link

#in python :

s1 = "ccggcctcgggaag"
s2 = "ttgcggacgctagc"
s3 = "tcgggctccccccg"
s4 = "ggggggaaggcgga"
s5 = "tctgtccccccccg"
g = "ggccgcctcccgcgcccctctgtcccctcccgtgttcggcctcgggaagtcggggcggcgggcggcgcgggccgggaggggtcgcctcgggctcaccccgccccagggccgccgggcggaaggcggaggccgagaccagacgcggagccatggccgaggtgttgcggacgctggccg"

#turn G in diccionary
while(i+14<=len(g)):
dicc[i]=g[i:i+14]
i+=1

def lcs(X, Y, m, n):
if m == 0 or n == 0:
return 0
elif X[m-1] == Y[n-1]:
return 1 + lcs(X, Y, m-1, n-1)
else:
return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))

def search_pos(str1,dicci):
for i in range(163):
if (lcs(str1,dicci[i],len(str1),len(dicci[i])) == 13):
return (i+1)

print(search_post(s1,dicci))

@pratikmachchar
Copy link

I am not able to run the python code it give multiple errors
Can you help, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment