Skip to content

Instantly share code, notes, and snippets.

@dniku
Last active May 5, 2018 12:12
Show Gist options
  • Save dniku/471be9b46ab4fa5518ae74724a644fe7 to your computer and use it in GitHub Desktop.
Save dniku/471be9b46ab4fa5518ae74724a644fe7 to your computer and use it in GitHub Desktop.
sklearn affinity issue
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "This notebook demostrates an issue with multiprocessing in sklearn.\n\nCPU affinity is an option every process has. It lists the CPU cores this process can run on. By default, it contains all CPU cores, but some libraries can mess with that."
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.178578Z",
"end_time": "2018-05-05T12:09:25.183120Z"
},
"trusted": true
},
"cell_type": "code",
"source": "import os\nimport multiprocessing",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "A function that makes sure CPU affinity is set to all cores:"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.185889Z",
"end_time": "2018-05-05T12:09:25.192988Z"
},
"trusted": true
},
"cell_type": "code",
"source": "def fix_affinity():\n # 0 means current process\n affinity = os.sched_getaffinity(0)\n if len(affinity) != multiprocessing.cpu_count():\n print(\"Something has messed with CPU affinity. Current affinity is {}. Fixing\".format(affinity),\n file=sys.stderr)\n os.sched_setaffinity(0, set(range(multiprocessing.cpu_count())))\n\n assert len(os.sched_getaffinity(0)) == multiprocessing.cpu_count(), os.sched_getaffinity(0)\n else:\n print(\"Affinity is OK: {}\".format(affinity))\n\nfix_affinity()",
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": "Affinity is OK: {0, 1, 2, 3, 4, 5, 6, 7}\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "By default affinity is set correctly. Let's import some libraries."
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.194865Z",
"end_time": "2018-05-05T12:09:25.202995Z"
},
"trusted": true
},
"cell_type": "code",
"source": "import sys\n\nfix_affinity()",
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": "Affinity is OK: {0, 1, 2, 3, 4, 5, 6, 7}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.204287Z",
"end_time": "2018-05-05T12:09:25.268040Z"
},
"trusted": true
},
"cell_type": "code",
"source": "import numpy as np\n\nfix_affinity()",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "Affinity is OK: {0, 1, 2, 3, 4, 5, 6, 7}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.270701Z",
"end_time": "2018-05-05T12:09:25.522382Z"
},
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd\n\nfix_affinity()",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "Affinity is OK: {0, 1, 2, 3, 4, 5, 6, 7}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.523730Z",
"end_time": "2018-05-05T12:09:25.906824Z"
},
"trusted": true
},
"cell_type": "code",
"source": "from sklearn.model_selection import train_test_split, RandomizedSearchCV\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.datasets import make_classification\n\nfix_affinity()",
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"text": "Something has messed with CPU affinity. Current affinity is {0}. Fixing\n",
"name": "stderr"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Oops. If this wasn't fixed, `n_jobs=-1` would have the effect that all processes would run on the same core, causing a slowdown.\n\nSome info:"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.908295Z",
"end_time": "2018-05-05T12:09:25.912313Z"
},
"trusted": true
},
"cell_type": "code",
"source": "import platform; print(platform.platform())\nimport sys; print(\"Python\", sys.version)\nimport numpy; print(\"NumPy\", numpy.__version__)\nimport scipy; print(\"SciPy\", scipy.__version__)\nimport sklearn; print(\"Scikit-Learn\", sklearn.__version__)",
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": "Linux-4.14.36-1-MANJARO-x86_64-with-arch-Manjaro-Linux\nPython 3.6.5 (default, Apr 12 2018, 22:45:43) \n[GCC 7.3.1 20180312]\nNumPy 1.14.2\nSciPy 1.0.1\nScikit-Learn 0.19.1\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.193Z"
},
"trusted": true
},
"cell_type": "code",
"source": "X_train_bow, y_train = make_classification(n_samples=3000, n_features=100, n_informative=10, random_state=42)",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Now we run `RandomizedSearchCV`."
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T12:09:25.195Z"
},
"trusted": true
},
"cell_type": "code",
"source": "fix_affinity()\n\nrs = RandomizedSearchCV(\n LogisticRegression(solver='lbfgs'),\n param_distributions={'C': np.logspace(-4, 5, 10000)},\n n_iter=50,\n cv=3,\n scoring='roc_auc',\n refit=False, n_jobs=-1, random_state=42, verbose=1\n)\nrs.fit(X_train_bow, y_train)",
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": "Affinity is OK: {0, 1, 2, 3, 4, 5, 6, 7}\nFitting 3 folds for each of 50 candidates, totalling 150 fits\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-05-05T11:46:13.730Z"
}
},
"cell_type": "markdown",
"source": "Aaaand it freezes."
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.5",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"varInspector": {
"window_display": false,
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"library": "var_list.py",
"delete_cmd_prefix": "del ",
"delete_cmd_postfix": "",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"library": "var_list.r",
"delete_cmd_prefix": "rm(",
"delete_cmd_postfix": ") ",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"position": {
"height": "1097px",
"left": "31px",
"right": "20px",
"top": "131px",
"width": "586px"
}
},
"gist": {
"id": "471be9b46ab4fa5518ae74724a644fe7",
"data": {
"description": "sklearn affinity issue",
"public": true
}
},
"_draft": {
"nbviewer_url": "https://gist.github.com/471be9b46ab4fa5518ae74724a644fe7"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment