Created
April 22, 2016 12:28
-
-
Save Swarchal/44ba436252274d60aa39068cf9f8965c to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Inferring mRNA from Protein\n", | |
"\n", | |
"Given a protein string, return the number of different RNA strings from which the protein could have been translated, answer in modulo 1,000,000" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# codon dictionary from previous problem\n", | |
"codon_dict = {\n", | |
" 'UUU': 'F', 'CUU': 'L', 'AUU': 'I', 'GUU': 'V',\n", | |
" 'UUC': 'F', 'CUC': 'L', 'AUC': 'I', 'GUC': 'V',\n", | |
" 'UUA': 'L', 'CUA': 'L', 'AUA': 'I', 'GUA': 'V',\n", | |
" 'UUG': 'L', 'CUG': 'L', 'AUG': 'M', 'GUG': 'V',\n", | |
" 'UCU': 'S', 'CCU': 'P', 'ACU': 'T', 'GCU': 'A',\n", | |
" 'UCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',\n", | |
" 'UCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',\n", | |
" 'UCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',\n", | |
" 'UAU': 'Y', 'CAU': 'H', 'AAU': 'N', 'GAU': 'D',\n", | |
" 'UAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',\n", | |
" 'UAA': 'Stop', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',\n", | |
" 'UAG': 'Stop', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',\n", | |
" 'UGU': 'C', 'CGU': 'R', 'AGU': 'S', 'GGU': 'G',\n", | |
" 'UGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',\n", | |
" 'UGA': 'Stop', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',\n", | |
" 'UGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'\n", | |
"}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"915648\n" | |
] | |
} | |
], | |
"source": [ | |
"def codon_freq():\n", | |
" tally = {}\n", | |
" for key, value in codon_dict.iteritems():\n", | |
" if not tally.has_key(value):\n", | |
" tally[value] = 0\n", | |
" tally[value] += 1\n", | |
" return tally\n", | |
"\n", | |
"\n", | |
"def get_all_possible(file_path):\n", | |
" x = open(file_path).read().strip()\n", | |
" freq = codon_freq()\n", | |
" out = freq['Stop']\n", | |
" for i in x:\n", | |
" out *= freq[i]\n", | |
" return out % 1000000\n", | |
"\n", | |
"print get_all_possible(\"/home/scott/Dropbox/rosalind/rosalind_mrna.txt\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"------------------------------------\n", | |
"\n", | |
"\n", | |
"\n", | |
"## Explanation\n", | |
"\n", | |
"- Ignore Rosalind, it's 2016:\n", | |
" * Don't actually have to worry about storing large integers\n", | |
" * Easier to just modulo the final answer\n", | |
"\n", | |
"`codon_freq()` returns another dictionary from `codon_dict` with a total number of possible codons for each amino acid" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'A': 4,\n", | |
" 'C': 2,\n", | |
" 'D': 2,\n", | |
" 'E': 2,\n", | |
" 'F': 2,\n", | |
" 'G': 4,\n", | |
" 'H': 2,\n", | |
" 'I': 3,\n", | |
" 'K': 2,\n", | |
" 'L': 6,\n", | |
" 'M': 1,\n", | |
" 'N': 2,\n", | |
" 'P': 4,\n", | |
" 'Q': 2,\n", | |
" 'R': 6,\n", | |
" 'S': 6,\n", | |
" 'Stop': 3,\n", | |
" 'T': 4,\n", | |
" 'V': 4,\n", | |
" 'W': 1,\n", | |
" 'Y': 2}" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"codon_freq()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`get_all_possible()` loops through the amino acids, looks up the value in `codon_freq` and keeps a cumulative product.\n", | |
"\n", | |
"Have to add in a look-up for the `Stop` codon as we don't have that in our input from Rosalind." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_all_possible(x):\n", | |
" \n", | |
" freq = codon_freq()\n", | |
" \n", | |
" # as stop isn't in our input\n", | |
" out = freq['Stop'] \n", | |
" \n", | |
" # loop through aa sequence\n", | |
" for i in x:\n", | |
" out *= freq[i] # cumulative product\n", | |
" \n", | |
" return out % 1000000 # return total modulo 1M\n", | |
" " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"12" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"test = \"MA\"\n", | |
"get_all_possible(test)\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.6" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment