Skip to content

Instantly share code, notes, and snippets.

@gregcaporaso
Last active October 12, 2015 07:47
Show Gist options
  • Save gregcaporaso/3994000 to your computer and use it in GitHub Desktop.
Save gregcaporaso/3994000 to your computer and use it in GitHub Desktop.
IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course. See the included README.md file for more details and licensing information.

IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course.

These closely follow the Python Programming chapters of Practical Computing for Biologists. A lot of exercises can be found in Learn Python the Hard Way.

This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these notebooks, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Caporaso Lecture 24"
},
"name": "Caporaso Lecture 24",
"nbformat": 2,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"source": "**Functions**\n\nYou may have noticed in your previous homework assignments that there are sections of code that are copied throughout the program. This is a problem for a few reasons:\n\n1. it's difficult to test (you need to test each implememtation of the functionality)\n2. it's harder to maintain (if you change something in one place, you need to change it everywhere, and as humans we're likely to mess that up)\n3. it's harder to develop (it takes longer) and harder for other developers to follow (including yourself in the future)\n\nFunctions are one of the primary ways that we re-use code, and we'll look at defining functions to address that today.\n"
},
{
"cell_type": "markdown",
"source": "A function definition contains a few parts: the `def` keyword, a function name, and a list of parameters (or values) that a function takes as input. This is followed by a block of code which represents the execution block of the code, followed by a `return` statement which defines the output of the function.\n\nHere's a simple function:"
},
{
"cell_type": "code",
"collapsed": true,
"input": "def cube(x):\n return x * x * x",
"language": "python",
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"source": "Here, our function name is ``cube``. This function takes one input value, which will be referred to as ``x`` within the function, and it return one value. What does it return? Note that the parenthesis denote the comma-separated list of input value(s), and that must be followed by a `:` for your function to be defined correctly."
},
{
"cell_type": "markdown",
"source": "At any point in the future, you can now call this function just like any of the built-in functions that we've worked with so far (e.g., ``float``, ``int``, or ``len``)."
},
{
"cell_type": "code",
"collapsed": false,
"input": "print cube(3)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "27"
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "x = 4\ny = cube(x)\nprint y",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "64"
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": "for e in range(10):\n print cube(e)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "0\n1\n8\n27\n64\n125\n216\n343\n512\n729"
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"source": "Let's define a few other functions to look at some variations:\n\nThis first function returns the absolute value of the difference of two numbers ``x`` and ``y``, which could be thought of as the distance between two points on a number line."
},
{
"cell_type": "code",
"collapsed": true,
"input": "def dist(x,y):\n return abs(x - y)",
"language": "python",
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": "dist(-2,10)",
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 30,
"text": "12"
}
],
"prompt_number": 30
},
{
"cell_type": "markdown",
"source": "This function returns that distance, as well as the cube of that distance. Notice how this function calls other functions that we've defined."
},
{
"cell_type": "code",
"collapsed": false,
"input": "def dist_and_squared_dist(x,y):\n dist_xy = dist(x,y)\n return dist_xy, cube(dist_xy)",
"language": "python",
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "markdown",
"source": "Why do you think I'm defining the ``dist_xy`` variable here, rather than just calling ``dist`` twice in the return statement?"
},
{
"cell_type": "markdown",
"source": "**Refactoring Programming Assignment 2**\n\nIn Programming Assignment 2, you developed a script to translate several variants of an input DNA sequence (the four orientations of that sequence). You're code likely looked something like this:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "gc = {'CTT': 'L', 'ACA': 'T', 'ACG': 'T', 'ATC': 'I', 'AAC': 'N',\n 'ATA': 'I', 'AGG': 'R', 'CCT': 'P', 'ACT': 'T', 'AGC': 'S', 'AAG': 'K',\n 'AGA': 'R', 'CAT': 'H', 'AAT': 'N', 'ATT': 'I', 'CTG': 'L', 'CTA': 'L',\n 'CTC': 'L', 'CAC': 'H', 'AAA': 'K', 'CCG': 'P', 'AGT': 'S', 'CCA': 'P',\n 'CAA': 'Q', 'CCC': 'P', 'TAT': 'Y', 'GGT': 'G', 'TGT': 'C', 'CGA': 'R',\n 'CAG': 'Q', 'TCT': 'S', 'GAT': 'D', 'CGG': 'R', 'TTT': 'F', 'TGC': 'C',\n 'GGG': 'G', 'GGA': 'G', 'TGG': 'W', 'GGC': 'G', 'TAC': 'Y',\n 'TTC': 'F', 'TCG': 'S', 'TTA': 'L', 'TTG': 'L', 'TCC': 'S', 'ACC': 'T',\n 'GCA': 'A', 'GTA': 'V', 'GCC': 'A', 'GTC': 'V', 'GCG': 'A','TAA': '*',\n 'GTG': 'V', 'GAG': 'E', 'GTT': 'V', 'GCT': 'A', 'GAC': 'D', 'CGT': 'R',\n 'GAA': 'E', 'TCA': 'S', 'ATG': 'M', 'CGC': 'R', 'TAG': '*', 'TGA': '*'}\n\n# the following would have been obtained from the user, rather than input directly\nrna_seq = \"ACCGTCGGATTACCGAAGGAA\"\n\nrna_seq_comp = rna_seq\nrna_seq_comp = rna_seq_comp.replace(\"T\",'a')\nrna_seq_comp = rna_seq_comp.replace(\"A\",'t')\nrna_seq_comp = rna_seq_comp.replace(\"G\",'c')\nrna_seq_comp = rna_seq_comp.replace(\"C\",'g')\nrna_seq_comp = rna_seq_comp.upper()\n\nprotein_seq = \"\"\nfor e in range(0,len(rna_seq),3):\n codon = rna_seq[e] + rna_seq[e+1] + rna_seq[e+2]\n aa = gc[codon]\n protein_seq = protein_seq + aa\n\nprint \"Forward orientation:\"\nprint protein_seq\n\nprotein_seq_comp = \"\"\nfor e in range(0,len(rna_seq_comp),3):\n codon = rna_seq_comp[e] + rna_seq_comp[e+1] + rna_seq_comp[e+2]\n aa = gc[codon]\n protein_seq_comp = protein_seq_comp + aa \n\nprint \"Forward complement orientation:\"\nprint protein_seq_comp",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Forward orientation:\nTVGLPKE\nForward complement orientation:\nWQPNGFL"
}
],
"prompt_number": 27
},
{
"cell_type": "markdown",
"source": "There are a few steps in here that would greatly benefit from the definition of functions. These are the translation step and the complementing step. Let's re-define those as see how it changes our script."
},
{
"cell_type": "code",
"collapsed": true,
"input": "",
"language": "python",
"outputs": [],
"prompt_number": " "
},
{
"cell_type": "markdown",
"source": "**Another look at nested for loops.**\n\nLet's now define a nested for loop that builds a distance matrix from a list of numbers using the ``dist`` function we defined above."
},
{
"cell_type": "code",
"collapsed": true,
"input": "l1 = [1,5,33.2,42,66,-98]",
"language": "python",
"outputs": [],
"prompt_number": 31
},
{
"cell_type": "code",
"collapsed": false,
"input": "for i in l1:\n for j in l1:\n print dist(i,j),\n print \"\"",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "0 4 32.2 41 65 99 \n4 0 28.2 37 61 103 \n32.2 28.2 0.0 8.8 32.8 131.2 \n41 37 8.8 0 24 140 \n65 61 32.8 24 0 164 \n99 103 131.2 140 164 0 "
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": true,
"input": "",
"language": "python",
"outputs": [],
"prompt_number": " "
}
]
}
]
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from sys import argv
usage = "Lecture25_example.py <name> <day>"
if len(argv) != 3:
print "ERROR: Incorrect number of arguments passed."
print "USAGE: " + usage
else:
script_name, name, day = argv
print "Hello " + name
print "Today is " + day
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment