Skip to content

Instantly share code, notes, and snippets.

@Midnighter
Created November 6, 2013 11:02
Show Gist options
  • Save Midnighter/7334305 to your computer and use it in GitHub Desktop.
Save Midnighter/7334305 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"IPython Parallel with globals in remote namespace"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.parallel import Client"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure that a cluster with a number of kernels was started using the specified profile."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cl = Client(profile=\"default\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dv = cl.direct_view()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to test nested functions in remote kernels also using global variables."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def outer_function(a):\n",
" global lookup\n",
" global backup\n",
" backup = defaultdict(int)\n",
" lookup[a][\"value\"] = inner_function(lookup[a][\"seq\"])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The inner function counts each symbol in a sequence and normalizes that frequency by the length of the sequence."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def inner_function(seq):\n",
" global backup\n",
" for sym in seq:\n",
" backup[sym] += 1\n",
" total = float(len(seq))\n",
" return [backup[key] / total for key in backup]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some different IDs for the dictionary."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"seq_ids = list(\"ABCDEF\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import random"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some random sequences stored under the IDs."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"lookup = dict()\n",
"for key in seq_ids:\n",
" lookup[key] = dict()\n",
" lookup[key][\"seq\"] = [random.choice(\"ATGC\") for i in xrange(1000)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from collections import defaultdict"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we push certain variables into the global namespace of the remote kernels. Note that instead of pushing the defaultdict, we could also execute::\n",
"\n",
" dv.execute(\"from collections import defaultdict\", block=True)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dv.push({\"lookup\": lookup, \"backup\": dict(), \"inner_function\": inner_function,\n",
" \"defaultdict\": defaultdict}, block=True)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"[None, None]"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we map the outer function to the IDs. If the outer function required more arguments, we could add additional sequences of equal length in the call."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"results = dv.map(outer_function, seq_ids, block=True)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No results, since the outer function has no return value but stores everything in the dictionary."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"results"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"[None, None, None, None, None, None]"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can retrieve the global dictionaries using `pull` which will return a list with one dictionary per remote kernel."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"all_dicts = dv.pull(\"lookup\", block=True)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We update the current dictionary with the remote ones."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"result_dict = lookup.copy()\n",
"for d in all_dicts:\n",
" for key in seq_ids:\n",
" result_dict[key].update(d[key])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are the normalized values for each ID."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for key in seq_ids:\n",
" print result_dict[key][\"value\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[0.248, 0.256, 0.241, 0.255]\n",
"[0.271, 0.254, 0.255, 0.22]\n",
"[0.25, 0.244, 0.255, 0.251]\n",
"[0.267, 0.256, 0.259, 0.218]\n",
"[0.255, 0.237, 0.255, 0.253]\n",
"[0.236, 0.269, 0.235, 0.26]\n"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we did this in serial, rather than in parallel, would we get the same result?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"backup = dict()\n",
"for key in seq_ids:\n",
" outer_function(key)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for key in seq_ids:\n",
" print lookup[key][\"value\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[0.248, 0.256, 0.241, 0.255]\n",
"[0.271, 0.254, 0.255, 0.22]\n",
"[0.25, 0.244, 0.255, 0.251]\n",
"[0.267, 0.256, 0.259, 0.218]\n",
"[0.255, 0.237, 0.255, 0.253]\n",
"[0.236, 0.269, 0.235, 0.26]\n"
]
}
],
"prompt_number": 17
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment