Skip to content

Instantly share code, notes, and snippets.

@Swarchal
Created April 22, 2016 12:28
Show Gist options
  • Save Swarchal/44ba436252274d60aa39068cf9f8965c to your computer and use it in GitHub Desktop.
Save Swarchal/44ba436252274d60aa39068cf9f8965c to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Inferring mRNA from Protein\n",
"\n",
"Given a protein string, return the number of different RNA strings from which the protein could have been translated, answer in modulo 1,000,000"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# codon dictionary from previous problem\n",
"codon_dict = {\n",
" 'UUU': 'F', 'CUU': 'L', 'AUU': 'I', 'GUU': 'V',\n",
" 'UUC': 'F', 'CUC': 'L', 'AUC': 'I', 'GUC': 'V',\n",
" 'UUA': 'L', 'CUA': 'L', 'AUA': 'I', 'GUA': 'V',\n",
" 'UUG': 'L', 'CUG': 'L', 'AUG': 'M', 'GUG': 'V',\n",
" 'UCU': 'S', 'CCU': 'P', 'ACU': 'T', 'GCU': 'A',\n",
" 'UCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',\n",
" 'UCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',\n",
" 'UCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',\n",
" 'UAU': 'Y', 'CAU': 'H', 'AAU': 'N', 'GAU': 'D',\n",
" 'UAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',\n",
" 'UAA': 'Stop', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',\n",
" 'UAG': 'Stop', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',\n",
" 'UGU': 'C', 'CGU': 'R', 'AGU': 'S', 'GGU': 'G',\n",
" 'UGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',\n",
" 'UGA': 'Stop', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',\n",
" 'UGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"915648\n"
]
}
],
"source": [
"def codon_freq():\n",
" tally = {}\n",
" for key, value in codon_dict.iteritems():\n",
" if not tally.has_key(value):\n",
" tally[value] = 0\n",
" tally[value] += 1\n",
" return tally\n",
"\n",
"\n",
"def get_all_possible(file_path):\n",
" x = open(file_path).read().strip()\n",
" freq = codon_freq()\n",
" out = freq['Stop']\n",
" for i in x:\n",
" out *= freq[i]\n",
" return out % 1000000\n",
"\n",
"print get_all_possible(\"/home/scott/Dropbox/rosalind/rosalind_mrna.txt\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------------------------------\n",
"\n",
"\n",
"\n",
"## Explanation\n",
"\n",
"- Ignore Rosalind, it's 2016:\n",
" * Don't actually have to worry about storing large integers\n",
" * Easier to just modulo the final answer\n",
"\n",
"`codon_freq()` returns another dictionary from `codon_dict` with a total number of possible codons for each amino acid"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'A': 4,\n",
" 'C': 2,\n",
" 'D': 2,\n",
" 'E': 2,\n",
" 'F': 2,\n",
" 'G': 4,\n",
" 'H': 2,\n",
" 'I': 3,\n",
" 'K': 2,\n",
" 'L': 6,\n",
" 'M': 1,\n",
" 'N': 2,\n",
" 'P': 4,\n",
" 'Q': 2,\n",
" 'R': 6,\n",
" 'S': 6,\n",
" 'Stop': 3,\n",
" 'T': 4,\n",
" 'V': 4,\n",
" 'W': 1,\n",
" 'Y': 2}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"codon_freq()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`get_all_possible()` loops through the amino acids, looks up the value in `codon_freq` and keeps a cumulative product.\n",
"\n",
"Have to add in a look-up for the `Stop` codon as we don't have that in our input from Rosalind."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_all_possible(x):\n",
" \n",
" freq = codon_freq()\n",
" \n",
" # as stop isn't in our input\n",
" out = freq['Stop'] \n",
" \n",
" # loop through aa sequence\n",
" for i in x:\n",
" out *= freq[i] # cumulative product\n",
" \n",
" return out % 1000000 # return total modulo 1M\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"12"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test = \"MA\"\n",
"get_all_possible(test)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment