Last active
January 4, 2016 05:39
-
-
Save gawbul/8576272 to your computer and use it in GitHub Desktop.
Introduction to loops for biologists
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Introduction to loops for biologists\n", | |
"\n", | |
"![Introduction to loops](http://www.kelloggs.ie/content/dam/workarea/assetpushqueue/images/web-raw-approved/eng%20IE/38/56/prod_img-253856.jpg/jcr:content/renditions/cq5dam.thumbnail.319.319.png \"Introduction to loops\")\n", | |
"\n", | |
"## Background\n", | |
"\n", | |
"* **This introduction assumes a very basic knowledge of programming, such as variable usage, assignment, and list indexing.**\n", | |
"* We can use programming languages to solve problems that require a defined, logical approach.\n", | |
"* Programming languages have different structures such as semantics, grammar and syntax, just as spoken languages do.\n", | |
"* In the same way spoken languages share common lexical terms and grammar, so do different programming languages.\n", | |
"* One type of structure shared between programming languages is the control of the flow of a program.\n", | |
"\n", | |
"## Control flow\n", | |
"\n", | |
"* Generally the instructions (code) of a program are executed in a stepwise manner from the beginning to the end.\n", | |
"* It is possible to add statements that change the flow of the program.\n", | |
"* Loops are one such way of controlling the flow of a program.\n", | |
"* Loops, as the name suggests, allow one to repeat a certain portion of code - generally based on certain conditions.\n", | |
"* There are a number of different ways of implementing loops, most of which are shared across programming languages.\n", | |
"\n", | |
"## Loop types\n", | |
"\n", | |
"* Some languages, such as the Perl programming language have a detailed repertoire of different loop types that are beyond the scope of this introduction.\n", | |
"* I will use Python here, as it has a very simple syntax for beginners, and only implements **for** loops and **while** loops." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### for loop" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# for loop\n", | |
"for i in range(1, 11):\n", | |
" print i" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"1\n", | |
"2\n", | |
"3\n", | |
"4\n", | |
"5\n", | |
"6\n", | |
"7\n", | |
"8\n", | |
"9\n", | |
"10\n" | |
] | |
} | |
], | |
"prompt_number": 30 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### while loop" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# while loop\n", | |
"i = 1\n", | |
"while i <= 10:\n", | |
" print i\n", | |
" i += 1" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"1\n", | |
"2\n", | |
"3\n", | |
"4\n", | |
"5\n", | |
"6\n", | |
"7\n", | |
"8\n", | |
"9\n", | |
"10\n" | |
] | |
} | |
], | |
"prompt_number": 31 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### for loops for biologists\n", | |
"\n", | |
"* The for loop allows you to iterate over a list of items\n", | |
"* This may be useful to iterate over all GenBank IDs in a list in order to fetch the FASTA formatted sequence corresponding to each ID." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# import modules required\n", | |
"from Bio import Entrez, SeqIO\n", | |
"\n", | |
"# set email so Entrez can tell us off if we send too many queries\n", | |
"Entrez.email = \"[email protected]\"\n", | |
"\n", | |
"# set a list of genbank ids\n", | |
"genbank_ids = [\"119395733\", \"568974803\", \"110626132\", \"347446670\", \"442628803\"]\n", | |
"\n", | |
"# iterate over gebank ids\n", | |
"for id in genbank_ids:\n", | |
" print \"Fetching data for id %s\" % id\n", | |
" print\n", | |
" handle = Entrez.efetch(db=\"nucleotide\", id=id, rettype=\"fasta\", retmode=\"text\")\n", | |
" record = SeqIO.read(handle, \"fasta\")\n", | |
" handle.close()\n", | |
" print record\n", | |
" print" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"Fetching data for id 119395733\n", | |
"\n", | |
"ID: gi|119395733|ref|NM_000059.3|\n", | |
"Name: gi|119395733|ref|NM_000059.3|\n", | |
"Description: gi|119395733|ref|NM_000059.3| Homo sapiens breast cancer 2, early onset (BRCA2), mRNA\n", | |
"Number of features: 0\n", | |
"Seq('GTGGCGCGAGCTTCTGAAACTAGGCGGCAGAGGCGGAGCCGCTGTGGCACTGCT...ATT', SingleLetterAlphabet())" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"\n", | |
"\n", | |
"Fetching data for id 568974803\n", | |
"\n", | |
"ID: gi|568974803|ref|XM_006533728.1|\n", | |
"Name: gi|568974803|ref|XM_006533728.1|\n", | |
"Description: gi|568974803|ref|XM_006533728.1| PREDICTED: Mus musculus contactin associated protein-like 1 (Cntnap1), transcript variant X1, mRNA\n", | |
"Number of features: 0\n", | |
"Seq('TCATCGTACCCGGAGTAAAGTCCCCAGGAGACCTGGTGGCATCAAAATGAGAAG...CCC', SingleLetterAlphabet())" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"\n", | |
"\n", | |
"Fetching data for id 110626132\n", | |
"\n", | |
"ID: gi|110626132|ref|NM_001030280.1|\n", | |
"Name: gi|110626132|ref|NM_001030280.1|\n", | |
"Description: gi|110626132|ref|NM_001030280.1| Danio rerio solute carrier family 24, member 5 (slc24a5), mRNA\n", | |
"Number of features: 0\n", | |
"Seq('GTAAGCCGCGGCGGTGTGTGTGTGTGTGTGTGTTCTCCGTCATCTGTGTTCTGC...CCT', SingleLetterAlphabet())" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"\n", | |
"\n", | |
"Fetching data for id 347446670\n", | |
"\n", | |
"ID: gi|347446670|ref|NM_001244612.1|\n", | |
"Name: gi|347446670|ref|NM_001244612.1|\n", | |
"Description: gi|347446670|ref|NM_001244612.1| Bos taurus insulin-like growth factor 1 receptor (IGF1R), mRNA\n", | |
"Number of features: 0\n", | |
"Seq('GAGAAAGGGGAATTTGGTCCCAAATAAAAGGAATGAAGTCTAGCTCCGGAGGAG...AAC', SingleLetterAlphabet())" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"\n", | |
"\n", | |
"Fetching data for id 442628803\n", | |
"\n", | |
"ID: gi|442628803|ref|NM_165390.2|\n", | |
"Name: gi|442628803|ref|NM_165390.2|\n", | |
"Description: gi|442628803|ref|NM_165390.2| Drosophila melanogaster Cullin-2 (Cul-2), transcript variant A, mRNA\n", | |
"Number of features: 0\n", | |
"Seq('CGATAGATTATATCGATATCGTCTTCGTCTGACAAACCATTTCACGATACCAAA...ATT', SingleLetterAlphabet())" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 32 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### while loops for biologists\n", | |
"\n", | |
"* The while loop allows you to repeat a task while a certain condition is true\n", | |
"* This can be useful when you need to run something only a given number of times or until you break the condition" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# set a list of genbank ids\n", | |
"spp_names = [\"homo_sapiens\", \"mus_musculus\", \"bos_taurus\", \"danio_rerio\", \"drosophila_melanogaster\"]\n", | |
"\n", | |
"# search through list\n", | |
"found = False\n", | |
"list_index = 0\n", | |
"while found == False:\n", | |
" name = spp_names[list_index]\n", | |
" if name == \"danio_rerio\":\n", | |
" print \"Found %s at list position %d.\" % (name, list_index + 1)\n", | |
" found = True\n", | |
" else:\n", | |
" print name\n", | |
" list_index += 1" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"homo_sapiens\n", | |
"mus_musculus\n", | |
"bos_taurus\n", | |
"Found danio_rerio at list position 4.\n" | |
] | |
} | |
], | |
"prompt_number": 33 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Summary\n", | |
"\n", | |
"* Loops allow you to alter the flow of a program.\n", | |
"* These can loop (iterate) over a list of items or repeat at task based on certain conditions.\n", | |
"* Different types of loops evaluate the conditions at different times in the execution of the program.\n", | |
"* It is wise to be certain exactly how the loop is executing your code in order to minimalise the need for debugging." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Further information\n", | |
"\n", | |
"* [Python for Biologists](http://pythonforbiologists.com)\n", | |
"* [A Primer on Python for Life Science Researchers](http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0030199)\n", | |
"* [Rosalind: Python Village](http://rosalind.info/problems/list-view/?location=python-village)" | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment