Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tritemio/5498153 to your computer and use it in GitHub Desktop.
Save tritemio/5498153 to your computer and use it in GitHub Desktop.
An IPython Notebook as an Interactive Parallel Computing Tutorial
Display the source blob
Display the rendered blob
"metadata": {
"name": "Interactive IPython Parallel Computing Tutorial"
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
"cells": [
"cell_type": "markdown",
"metadata": {},
"source": [
"Interactive IPython Parallel Computing Tutorial\n",
"This tutorial shows some a basic examples on how to use the powerful Parallel Computing functionality of [IPython](\n",
"The tutorial itself is written in IPython Notebook of which you are reading is a static HTML representation. To execute this notebook on your computer download it and drag&drop the file on the Notebook Dashboard. You can find a \"Download Notebook\" link in the upper right part of this page.\n",
"Other interesting resources:\n",
" \n",
"- [Running Code in the IPython Notebook](\n",
"- [Offician IPython Documentation](\n",
"**DISCLAIMER**: Part of this tutorial is shamelessly copied from the [Official IPython Parallel Computing Documentation](\n",
"Installation Requirements\n",
"To run this tutorial you have to install a recent version of [IPython]( Some commands will also require Numpy. \n",
"It is recommended however to install a complete scientific python environment.\n",
"On Windows, my favorite scientific python distribution is [WinPython]( it has 64bit support, includes a wonderful IDE called [Spyder](, and the installation folder can be moved anywhere.\n",
"After installing WinPython, to launch the IPython Notebook click on **WinPython Command Prompt** and type:\n",
" ipython notebook --pylab inline\n",
"At this point a web browser should automagically open showing the **IPython Notebook Dashboard**."
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting the the cluster\n",
"For the purposes this tutorial you can start a cluster on your local machine. Just go to the \"Notebook Dashboard\" tab in your browser, click on the *Cluster* tab and specify the number of \"parallel python sessions\" (called **ipengines**) to start. A good number is the number of your cores. After clicking **Start** your local cluster should be running.\n",
"**NOTE:** To setup a more complex cluster you can follow the official IPython documentation [here]( \n",
"Once the cluster is started (doesn't matter if locally or on remote machines) the following tutorial can be followed and re-executed."
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting a parallel session\n",
"Once the cluster is started we ca oper a new ipython notebook and run this command to connect to the running engines:"
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.parallel import Client"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
"cell_type": "code",
"collapsed": false,
"input": [
"rc = Client()"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
"cell_type": "markdown",
"metadata": {},
"source": [
"The `rc` variable will contain all the running enignes. With the `.ids` attribute we can see the ID associated with each engine. If the list is empty no engine is running. In our case:"
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 3,
"text": [
"[0, 1]"
"prompt_number": 3
"cell_type": "markdown",
"metadata": {},
"source": [
"To use the engine we first have to select them. The selection is done through python indexing or slicing. For example to select all the running engines just do:"
"cell_type": "code",
"collapsed": false,
"input": [
"dview = rc[:]"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
"cell_type": "markdown",
"metadata": {},
"source": [
"Now `dview` contains a DirectView object that can be used to send/receive code and data back an forth between our session and the engines."
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute code on the enignes\n",
"Our enignes are basically multiple ipython process running in parallel. To run a command an all our engines we can use the **`%px`** magic command. Let see some examples:"
"cell_type": "code",
"collapsed": false,
"input": [
"%px print 'ciao'"
"language": "python",
"metadata": {},
"outputs": [
"output_type": "stream",
"stream": "stdout",
"text": [
"[stdout:0] ciao\n",
"[stdout:1] ciao\n"
"prompt_number": 5
"cell_type": "code",
"collapsed": false,
"input": [
"%px import os\n",
"%px print os.getpid()"
"language": "python",
"metadata": {},
"outputs": [
"output_type": "stream",
"stream": "stdout",
"text": [
"[stdout:0] 3375\n",
"[stdout:1] 3376\n"
"prompt_number": 6
"cell_type": "code",
"collapsed": false,
"input": [
"%px from numpy.random import randint\n",
"%px a = rand(5)\n",
"%px print a"
"language": "python",
"metadata": {},
"outputs": [
"output_type": "stream",
"stream": "stdout",
"text": [
"[stdout:0] [ 0.15690218 0.56782216 0.92297292 0.19870273 0.39490221]\n",
"[stdout:1] [ 0.06350091 0.94723982 0.21775028 0.2323376 0.19959411]\n"
"prompt_number": 7
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE:** Under the hood the **%px** command uses the method [`dview.execute()`]( to run the command. This method returns an [AsyncResult object]( that is used to see the output. The magic **%px** convenientily shows the output right away."
"cell_type": "markdown",
"metadata": {},
"source": [
"Transferring data: Push/Pull\n",
"With the last command we created a variable **a** on each engine. To transfer it to our local session, we **pull** it using the **`dview`** object:"
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 8,
"text": [
"[array([ 0.15690218, 0.56782216, 0.92297292, 0.19870273, 0.39490221]),\n",
" array([ 0.06350091, 0.94723982, 0.21775028, 0.2323376 , 0.19959411])]"
"prompt_number": 8
"cell_type": "markdown",
"metadata": {},
"source": [
"Basically, the python dictionary syntax is used on the dview object to **pull** data from the engines. We see that the command return a list in which each element is the requested object (in this case a numpy array).\n",
"Similarly, in order to **push** data to the remote engines we can use the dictionary assignment syntax:"
"cell_type": "code",
"collapsed": false,
"input": [
"dview['b'] = 3"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE:** The same functionality can be obtained through the methods `dview.push()` and `dview.pull()`. See [here]( for more details."
"cell_type": "markdown",
"metadata": {},
"source": [
"###Push/Pull Numpy arrays\n",
"When moving Numpy arrays we must be aware that the data at destination is always read-only. To modify the array we must make a copy.\n",
"See [Details of Parallel Computing with IPython]( for more information."
"cell_type": "markdown",
"metadata": {},
"source": [
"Parallel map: map_syn(), map_asynch()\n",
"As a first example we use the parallel map that returns a list.\n",
"This example apply the scatter/gather method to split an array/list, send the fragments to the engines (all apply the same function but on different data), and finally recollect (gather) the result in a single list."
"cell_type": "code",
"collapsed": false,
"input": [
"parallel_result = dview.map_sync(lambda x: x**10, arange(32))"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
"cell_type": "code",
"collapsed": false,
"input": [
"serial_result = map(lambda x:x**10, arange(32))"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
"cell_type": "code",
"collapsed": false,
"input": [
"(parallel_result == serial_result)"
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 12,
"text": [
"prompt_number": 12
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute functions on the engines: apply*\n",
"This will call the same function on all the engines"
"cell_type": "code",
"collapsed": false,
"input": [
"dview['a'] = 5\n",
"dview['b'] = 10\n",
"dview.apply(lambda x: a+b+x, 27)"
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 13,
"text": [
"[42, 42]"
"prompt_number": 13
"cell_type": "code",
"collapsed": false,
"input": [
"ar = dview.apply_async(lambda x: a+b+x, 33)"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 15,
"text": [
"[48, 48]"
"prompt_number": 15
"cell_type": "markdown",
"metadata": {},
"source": [
"We can send different functions to different engines using the target property:"
"cell_type": "code",
"collapsed": false,
"input": [
"dview.targets = 0\n",
"ar0 = dview.apply_async(lambda x: a+b+x, 27)\n",
"dview.targets = 1\n",
"ar1 = dview.apply_async(lambda x: a+b+x, 33)"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
"cell_type": "code",
"collapsed": false,
"input": [
"print ar0.get(), ar1.get()"
"language": "python",
"metadata": {},
"outputs": [
"output_type": "stream",
"stream": "stdout",
"text": [
"42 48\n"
"prompt_number": 17
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively one can create different DirectViews (dview) by slicing rc and apply a different function to each of them."
"cell_type": "markdown",
"metadata": {},
"source": [
"cell_type": "code",
"collapsed": false,
"input": [
"dview.targets = [0,1]\n",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 19,
"text": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])"
"prompt_number": 19
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": [
"output_type": "pyout",
"prompt_number": 20,
"text": [
"[array([0, 1, 2, 3, 4, 5, 6, 7]), array([ 8, 9, 10, 11, 12, 13, 14, 15])]"
"prompt_number": 20
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
"metadata": {}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment