Skip to content

Instantly share code, notes, and snippets.

@orleika
Created August 10, 2017 14:58
Show Gist options
  • Save orleika/1ca6130c81153de5cf8bc01dcb2b0221 to your computer and use it in GitHub Desktop.
Save orleika/1ca6130c81153de5cf8bc01dcb2b0221 to your computer and use it in GitHub Desktop.
Jupyter Notebookで並列計算
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"以下のコマンドで必須パッケージのインストール&ipyparallelの有効化\n",
"\n",
"```\n",
"$ apt-get install mpich -y\n",
"$ pip install ipyparallel mpi4py\n",
"$ jupyter nbextension install --sys-prefix --py ipyparallel\n",
"$ jupyter nbextension enable --sys-prefix --py ipyparallel\n",
"$ jupyter serverextension enable --sys-prefix --py ipyparallel\n",
"```\n",
"\n",
"念のためMPIでクラスタにアクセスできるかテストしておく.\n",
"```\n",
"$ mpiexec -n 4 python -m mpi4py helloworld\n",
"```\n",
"\n",
"インストール後,Jupyter NotebookのTOPのIPython Clustersタブからクラスタを起動できる. \n",
"適当な数でクラスタをStartした後に以下のスクリプトを実行する. \n",
"すべてのクラスタで探索するmapを共有するので,マシンの搭載メモリ量とCPUコア数によってクラスタ数を決める."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import sklearn.metrics as metrics\n",
"import itertools\n",
"import ipyparallel"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"56000000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u = np.random.randint(2, size=(100000, 70))\n",
"u.nbytes"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"7000000\n",
"100000\n",
"6900000\n"
]
}
],
"source": [
"# デフォルトだとint64で初期化されている.(一つの値に8bytes必要)\n",
"# X.shape[1]が取りうる値の最大値なので,8bit(=256)で十分\n",
"# 8bitにすることで単純に1/8のサイズになる.\n",
"u = u.astype(np.int8)\n",
"y = u[:, 0]\n",
"X = u[:, 1:]\n",
"print(u.nbytes)\n",
"print(y.nbytes)\n",
"print(X.nbytes)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AsyncResult: _push>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rc = ipyparallel.Client()\n",
"lv = rc.load_balanced_view()\n",
"%px import sklearn.metrics as metrics\n",
"%px import numpy as np\n",
"rc[:].push({'y': y, 'X': X})"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def auc(cmb):\n",
" pred = np.mean(np.take(X, cmb, axis = 1), axis = 1)\n",
" fpr, tpr, thresholds = metrics.roc_curve(y, pred)\n",
" return metrics.auc(fpr, tpr)\n",
"\n",
"def aucs(cmbs):\n",
" rs = lv.map_async(auc, cmbs)\n",
" rs.wait_interactive()\n",
" return rs[:]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2/2 tasks finished after 0 s\n",
"done\n"
]
},
{
"data": {
"text/plain": [
"[0.49958438948356787, 0.49963179724969625]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aucs([list(range(X.shape[1])), [0, 1, 2, 3, 4, 5]])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@orleika
Copy link
Author

orleika commented Aug 10, 2017

記述したコマンドでパッケージ有効化するか,https://gist.github.com/orleika/3c5c4ceb6acd4550ffc74a94dbaa0989 をdocker-composeで立ち上げてください.

@orleika
Copy link
Author

orleika commented Aug 10, 2017

あと,u.astype(np.int8) が結構重要.

@orleika
Copy link
Author

orleika commented Aug 15, 2017

これをコンソールで入力してください.

$ ipython profile create --parallel --profile=default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment