Created
July 13, 2016 20:16
-
-
Save MikeTrizna/92b5d87a757b24a083c5bff2e1a031fa to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll start out with a short DNA sequence as our example string." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"dna = 'ACTAGCTACGCTCGATACGCATCG'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's check out the type of this variable, just to be sure." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class 'str'>\n" | |
] | |
} | |
], | |
"source": [ | |
"print(type(dna))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now let's try running another **function** on the variable. Here's a good example of something that computers are good at: counting." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"24\n" | |
] | |
} | |
], | |
"source": [ | |
"print(len(dna))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now let's see what kinds of different ways we can change this string.\n", | |
"The first one we'll try is to convert it to lowercase. We'll use the lower() **method**. Notice how this **method** is different than **functions** that we've used previously." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"actagctacgctcgatacgcatcg\n" | |
] | |
} | |
], | |
"source": [ | |
"print(dna.lower())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Also, it's important to point out that the \"dna\" variable did not change by applying that method." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"ACTAGCTACGCTCGATACGCATCG\n" | |
] | |
} | |
], | |
"source": [ | |
"print(dna)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To truly change a string variable, you need to re-save those changes back to the same variable name." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"actagctacgctcgatacgcatcg\n" | |
] | |
} | |
], | |
"source": [ | |
"dna = dna.lower()\n", | |
"print(dna)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Notice that the lower() method just used an empty parenthesis. We'll now use the replace() method to show how to pass **parameters** that tell the method what to do. To demonstrate this, we'll tell Python to replace the \"thymine\" bases with \"uracil\" to convert it to an rna sequence." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"acuagcuacgcucgauacgcaucg\n" | |
] | |
} | |
], | |
"source": [ | |
"rna = dna.replace('t', 'u')\n", | |
"print(rna)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Another very useful action that Python can perform is called **slicing**. Say we realized that we only wanted to use the first 10 bases of the dna sequence for some analysis. We use **brackets[]** to do this, and then we tell Python where to start and where to end. It's important to note that **counting in Python starts with 0**." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"actagctacg\n" | |
] | |
} | |
], | |
"source": [ | |
"print(dna[0:10])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In fact, if we're just starting with the beginning of the string, we don't need to write out the 0." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"actagctacg\n" | |
] | |
} | |
], | |
"source": [ | |
"print(dna[:10])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, what if a colleague tells us about a 5-bp stretch that we were forgetting from our sequence? We can add that to our original sequence, using the **+ operator**. Remember from last class, that the \"+\" is used to add numbers, but it can add strings too?? This is an example of **operator overloading**, where an operator can be programmed to do different operations based on the type of object it's working with." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"actagctacgctcgatacgcatcgagtca\n", | |
"29\n" | |
] | |
} | |
], | |
"source": [ | |
"missing_stretch = 'agtca'\n", | |
"dna = dna + missing_stretch\n", | |
"print(dna)\n", | |
"print(len(dna))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Another example of operator overloading in strings is the *** operator**. The \\* will \"multiply\" a string howevery many times we tell it. Here we make this new sequence into a repeat." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"actagctacgctcgatacgcatcgagtcaactagctacgctcgatacgcatcgagtca\n", | |
"58\n" | |
] | |
} | |
], | |
"source": [ | |
"repeat_dna = dna * 2\n", | |
"print(repeat_dna)\n", | |
"print(len(repeat_dna))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment