#Notes from ipython notebook lab talk
##Introduction
ipython
is an alternative shell for python which is aimed at meeting the needs of interactive/scientific computing. One of it's coolest features is the ipython notebook
, which allows users to combined styled text, fancy looking maths, code and the results of that codes execution.
Although it has "python" in the name, the notebook is not restricted to that programing language. "Magic" functions allow users to execute different languages in an "ipython" notebook, and other languages (including Julia, Haskell and Ruby...) can be the default language of a whole notebook. In fact, the notebook project recently changed its name to jupyter
(from JUlia PYthon and R -- the "languages of open science").
The notebook is especially helpful for recording exploritory/interactive analyses and plots made in bioinformatics/computational biology.
##Install
The notebook has it's own install page, if you are python person and you already use pip
you can install ipython and everything else needed for the notebook:
pip install "ipython[notebook]"
If you aren't yet a python person you can install python and bunch of useful python add-ons in a pointy-clicky way using either canopy or annoconda
If you are a linux person, the your package manager probably has both ipython and the ipython notebook available. On ubuntu this should work, I don't know about other distros
$sudo apt-get install ipython-notebook
##Getting started
From terminal you can start a new ipython notebook like this:
$ cd analysis/my_project_dir
$ ipython notebook
(It's a good idea to run the notebook from the root of your working directory from a poject. That way all the file paths in your notebook will be reproducible if you move stuff around or share the directory with others)
This will start a webserver running on your computer, an open your default browser to a "home" page listing the files in your directory. Click "New Notebook" in the top right hand corner to start your first notebook
##What cells can do
The notebook is made of "cells" that can be in one of two different forms.
- "code" cells that contain python (or other) code that can be exectuted
- "Markdown" cells that contain plain text which can be styled using a few simple tricks.
###markdown
Markdown is really neat way of writing plain text that can have rich features like styling, links, bullet lists and images. In adition to ipython notebooks you can use markdown on github, stack overflow and a bunch of web-focused tools (in fact, this document is written in markdown, click on "raw" to see the source).
Github has some nice tutorials on how to write their "flavour" of markdown :
- An intro
- A more complete guide (with lots of github-specific extras)
- A cheat sheet
I also like StackEdit, which is online markdown editor which gives you a live preview of what you are typing.
You should use the markdown cells to record any steps that have gone on prior to the work you do in the notbok (eg earlier steps of a bioinformatics pipeline, or the names and versions of software you have used to make the data you are working on).
###TeX
Markdown cells can also be used to make pretty rendering of mathmatical equations. This gist doesn't now how to make the math pretty, so you will have to paste these into a cell to see them.
You can use single dollar signs to do math "inline": $ f(x) = e ^x $, or double dollars to make a block of math:
##What code cells can do
You can do more or less anyting you would normally do in python in a notebook cell. You can also use "magic" functions, including
!
(bang!) to run a shell command (might need cygwin on windows?)! ls
will list files in the working dir! wc -l table.tsv
will count number rows in the filetable.csv
- One of my notebooks that is mainly
!
cells
- You can run entirely different languages in a cell, there are "magic" commands for
R
,Ruby
,Perl
,Julia
... %lsmagic
will list all the magic functions available to ipython (and you can install more)
##Python for scientific computing There are a growing number of tools for scientific computing in python. You should know about (at least):
Numpy
, which provides anarray
object and lots of fast numerical functionsMatplotlib
Plotting for python, with a matlab like interfact (designed to play nicely with numpy)Pandas
which provides aDataFrame
, a speadsheet like data object similar to R'sdata.frame
Biopython
which provides functions and objects for dealing with basic bioinformatics elements like sequences and trees and wrappers for popular programs like BLAST.
##Sharing notebooks
One of the main reasons to use the notebook is to enable sharing between collaborators. Each ipython notebook is saved as a big JSON
file, but if you want to share a notebook which makes use of other files you should share a whole directory (or ideally, make it a git repository so you have version control and can share via github).
You can also covert notebooks to PDF with nbconvert
, or render them online via the notebook viewer (needs to be share publically on github/gist.gitub for this to work).
I've share the example we worked through today, and the notebook viewer has a list of other example notebooks.
##The iAnythingNotebook
The inner workings of the notebook don't actually require python at all. It's possible for develoepers to write another kernal to get between the notebook in your browser and what gets executed on your computer. As I mention in the intro, further development of the notebook is now called project jupyter
to recognise the "language agnostic" future.
For our group the most interesting development is probably IJulia
. Once installed (instructions at that link), you can start an IJulia notebook with
$ ipython notebook --profile julia
##Similar projects
You may also be interested in Rmarkdown
, and especially Rmarkdown in RStudio
, which uses a a similar "literate programming" approach.
The example I spoke about today was the README file here which is generated from this rmarkdown file