Skip to content

Instantly share code, notes, and snippets.

@gregcaporaso
Last active October 12, 2015 07:47
Show Gist options
  • Save gregcaporaso/3994000 to your computer and use it in GitHub Desktop.
Save gregcaporaso/3994000 to your computer and use it in GitHub Desktop.
IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course. See the included README.md file for more details and licensing information.

IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course.

These closely follow the Python Programming chapters of Practical Computing for Biologists. A lot of exercises can be found in Learn Python the Hard Way.

This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these notebooks, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Caporaso Lecture 25"
},
"name": "Caporaso Lecture 25",
"nbformat": 2,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"source": "**File I/O**\n\nToday we'll learn how to interact with files, including how to read from them and write to them. We'll also cover some tips for parising files."
},
{
"cell_type": "markdown",
"source": "First, to open files we use the built in ``open`` function, which takes a file path and a `mode` to open the file in. The commonly used modes are ``U`` (read, with universal line break support), ``r`` (read, with unix line break support), ``w`` (write, overwriting any existing file content), and ``a`` (write, appending to any existing file content).\n\nWe'll define the path to an existing file first. You can run some of the common shell commands to see what the file looks like (remember that prefixing a line with ``!`` means that it should be run with ``python`` instead of ``bash``."
},
{
"cell_type": "code",
"collapsed": false,
"input": "fp = \"glen_canyon_map.tsv\"",
"language": "python",
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "!ls $fp",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "glen_canyon_map.tsv"
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head $fp",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "#SampleID\tBarcodeSequence\tsite_number\tbag_sample_id\tsample_pH\tSample_Type\tWell_ID\tSample_Plate\tPrimer_Plate\tLane\tLatitude\tLongitude\tType\tEnv\tSourceSink\testimated_elevation\tCurrentlyWet\testimated_years_since_submerged\testimated_years_since_submerged_for_plotting\testimated_last_submerged\tgps_elevation\tgps_elevation_minus_estimated_elevation\tMonth\tDay\tYear\tdays_since_epoch\tHour\tSite\tSite_Name\tReplicate\tDNA.I.D.No.\tDescription\nHalls10R1\tTACGCGCTGAGA\t10\tha10r1\t9.52\tSoil\tg1\tGlenCanyon\t5\tLane4\tno_data\tno_data\tSoil\tLocalSoilCrust\tsource\t3723\tNo\tNA\t37\tNA\t3740\t17\t9\t29\t2010\t14881\t12\tHalls10\tHalls\t1\t11\tGlenCanyon_ha10r1\nHalls10R2\tTAGATCCTCGAT\t10\tha10r2\t9.5\tSoil\th1\tGlenCanyon\t5\tLane4\tno_data\tno_data\tSoil\tLocalSoilCrust\tsource\t3723\tNo\tNA\t37\tNA\t3740\t17\t9\t29\t2010\t14881\t12\tHalls10\tHalls\t2\t12\tGlenCanyon_ha10r2\nHCanyon10R1\tGCTTGCGAGACA\t10\thc9r1\t9.29\tSoil\ta5\tGlenCanyon\t5\tLane4\tN37_33.006\tW110_40.590\tSoil\tLocalSoilCrust\tsource\t3723\tNo\tNA\t37\tNA\t3732\t9\t9\t30\t2010\t14882\t1\tHCanyon10\tHcanyon\t1\t43\tGlenCanyon_hc9r1\nHCanyon10R2\tGTACGGCATACG\t10\thc9r2\t9.25\tSoil\tb5\tGlenCanyon\t5\tLane4\tN37_33.006\tW110_40.590\tSoil\tLocalSoilCrust\tsource\t3723\tNo\tNA\t37\tNA\t3732\t9\t9\t30\t2010\t14882\t1\tHCanyon10\tHcanyon\t2\t44\tGlenCanyon_hc9r2\nHCanyon10R3\tGTATGCGCTGTA\t10\thc9r3\t9.23\tSoil\tc5\tGlenCanyon\t5\tLane4\tN37_33.006\tW110_40.590\tSoil\tLocalSoilCrust\tsource\t3723\tNo\tNA\t37\tNA\t3732\t9\t9\t30\t2010\t14882\t1\tHCanyon10\tHcanyon\t3\t45\tGlenCanyon_hc9r3\nHCanyon11R1\tGTTCGCGTATAG\t11\thc10r1\t8.89\tSoil\te12\tGlenCanyon\t5\tLane4\tN37_33.002\tW110_40.585\tSoil\tLocalSoilCrust\tsource\t3733\tNo\tNA\t37\tNA\t3739\t6\t9\t30\t2010\t14882\t1\tHCanyon11\tHcanyon\t1\t109\tGlenCanyon_hc10r1\nHCanyon11R2\tTACGATGACCAC\t11\thc10r2\t8.84\tSoil\tf12\tGlenCanyon\t5\tLane4\tN37_33.002\tW110_40.585\tSoil\tLocalSoilCrust\tsource\t3733\tNo\tNA\t37\tNA\t3739\t6\t9\t30\t2010\t14882\t1\tHCanyon11\tHcanyon\t2\t110\tGlenCanyon_hc10r2\nHCanyon11R3\tTAGATAGCAGGA\t11\thc10r3\t8.9\tSoil\tg12\tGlenCanyon\t5\tLane4\tN37_33.002\tW110_40.585\tSoil\tLocalSoilCrust\tsource\t3733\tNo\tNA\t37\tNA\t3739\t6\t9\t30\t2010\t14882\t1\tHCanyon11\tHcanyon\t3\t111\tGlenCanyon_hc10r3\nHCanyon12R1\tGTAGCGCGAGTT\t12\thc11r1\t9.2\tSoil\tb11\tGlenCanyon\t5\tLane4\tN37_33.001\tW110_40.579\tSoil\tLocalSoilCrust\tsource\t3743\tNo\tNA\t37\tNA\t3749\t6\t9\t30\t2010\t14882\t1\tHCanyon12\tHcanyon\t1\t97\tGlenCanyon_hc11r1"
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"source": "As you can tell, we're looking at a QIIME-compatible metadata mapping file. One thing we might want to do is read this file in, perform some processing, and process that information. Imagine for example that you want to know what pH range these soils cover - let's look at how to do that.\n\nFirst, we'll open the file for reading. Here I'm opening the file in ``U`` mode, to open with support for universal line breaks. This is how you should always open a file for reading, except in rare circumstances (specifically, if it's a binary file, like a ``.gz`` file - this is rare)."
},
{
"cell_type": "code",
"collapsed": true,
"input": "f = open(fp,'U')",
"language": "python",
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"source": "Often, we read files by iterating over the lines with a for loop. For example, we can do the following:\n\n``for line in f:``\n\nwhich will iteratively set line to each line in the file. In our case, we want to identify the first line in the file so we can find the ``sample_pH`` column index, and then we want to store that value for each line. We could do this as follows:"
},
{
"cell_type": "code",
"collapsed": true,
"input": "pH = []\nfor line in f:\n # first, let's clean up the line by\n # removing any leading or trailing \n # whitespace\n line = line.strip()\n # then, let's split it into a list\n # of tab-separated values\n fields = line.split('\\t')\n # next, let's check if the line is our \n # header line\n if line.startswith('#'):\n # if so, then find the position of sample_pH \n pH_index = fields.index('sample_pH')\n else:\n # if this isn't a header line, it\n # must be a data line, so let's get the\n # sample's pH\n pH.append(float(fields[pH_index]))",
"language": "python",
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "markdown",
"source": "We should now have the all of the pH values. Let's check with a print statement."
},
{
"cell_type": "code",
"collapsed": false,
"input": "print pH",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "[9.52, 9.5, 9.29, 9.25, 9.23, 8.89, 8.84, 8.9, 9.2, 9.26, 9.29, 9.27, 9.21, 9.3, 9.27, 9.3, 9.3, 9.34, 9.21, 9.19, 9.1, 9.19, 9.2, 9.1, 8.16, 8.19, 8.2, 9.17, 9.15, 9.55, 9.44, 9.41, 8.02, 8.0, 8.05, 9.17, 9.16, 9.23, 9.26, 9.45, 9.41, 9.46, 9.49, 9.5, 9.54, 9.44, 9.42, 9.06, 9.01, 9.35, 9.4, 9.31, 9.26, 9.13, 9.55, 9.44, 9.41, 9.65, 9.71, 9.67, 9.49, 9.39, 9.42, 9.45, 9.49, 9.46, 9.47, 9.5, 9.51, 9.34, 9.3, 9.38, 8.89, 8.82, 8.9, 8.35, 8.3, 8.02, 8.0, 8.05, 8.13, 8.21, 8.44, 8.82, 8.75, 8.71, 8.85, 9.5, 9.54, 9.49, 9.29, 9.38, 9.35, 9.06, 8.9]"
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"source": "There are a few ways that we can check the min and max pH to determine the range. We'll go over these as a group to see what we come up with."
}
]
}
]
}
from sys import argv
usage = "Lecture25_example.py <name> <day>"
if len(argv) != 3:
print "ERROR: Incorrect number of arguments passed."
print "USAGE: " + usage
else:
script_name, name, day = argv
print "Hello " + name
print "Today is " + day
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment