-
-
Save stsievert/30702575de95328f199ab1d7e50795ef to your computer and use it in GitHub Desktop.
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Download data from: http://labs.criteo.com/2013/12/download-terabyte-click-logs-2/. I only read in (part of) one day, `day_0`." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The below cell is uncommented and run to get the input data (not tested, which is why it's commented)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# import dask.dataframe as dd\n", | |
"# from dask.distributed import Client\n", | |
"# client = Client()\n", | |
"\n", | |
"# categories = ['category_%d' % i for i in range(26)]\n", | |
"# columns = ['click'] + ['numeric_%d' % i for i in range(13)] + categories\n", | |
"\n", | |
"# df = dd.read_csv('data/day_0', sep='\\t', names=columns, header=None) \n", | |
"\n", | |
"# encoding = {c: 'bytes' for c in categories}\n", | |
"# fixed = {c: 8 for c in categories}\n", | |
"# df.to_parquet('data/day-0-bytes.parquet', object_encoding=encoding,\n", | |
"# fixed_text=fixed, compression='SNAPPY')\n", | |
"\n", | |
"# df_train, df_test = df.random_split([0.05, 0.95],\n", | |
"# random_state=42)\n", | |
"\n", | |
"# categories = ['category_%d' % i for i in range(26)]\n", | |
"# columns = ['click'] + ['numeric_%d' % i for i in range(13)] + categories\n", | |
"\n", | |
"# encoding = {c: 'bytes' for c in categories}\n", | |
"# fixed = {c: 8 for c in categories}\n", | |
"# df_train.to_parquet('data/day-0-train-bytes.parquet', object_encoding=encoding,\n", | |
"# fixed_text=fixed, compression='SNAPPY')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import dask.dataframe as dd\n", | |
"from dask.distributed import Client\n", | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table style=\"border: 2px solid white;\">\n", | |
"<tr>\n", | |
"<td style=\"vertical-align: top; border: 0px solid white\">\n", | |
"<h3>Client</h3>\n", | |
"<ul>\n", | |
" <li><b>Scheduler: </b>tcp://127.0.0.1:59115\n", | |
" <li><b>Dashboard: </b><a href='http://127.0.0.1:8787/status' target='_blank'>http://127.0.0.1:8787/status</a>\n", | |
"</ul>\n", | |
"</td>\n", | |
"<td style=\"vertical-align: top; border: 0px solid white\">\n", | |
"<h3>Cluster</h3>\n", | |
"<ul>\n", | |
" <li><b>Workers: </b>8</li>\n", | |
" <li><b>Cores: </b>8</li>\n", | |
" <li><b>Memory: </b>17.18 GB</li>\n", | |
"</ul>\n", | |
"</td>\n", | |
"</tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<Client: scheduler='tcp://127.0.0.1:59115' processes=8 cores=8>" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"client = Client(threads_per_worker=1, n_workers=8)\n", | |
"client" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"df = dd.read_parquet('data/day-0-train-bytes.parquet')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"df = df.sample(frac=0.01).persist()\n", | |
"df = df.compute()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"y = df['click'].values" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`Munge` and `Normalize` are still really rough: my only goal for these cells was to get something working." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.base import BaseEstimator, TransformerMixin\n", | |
"from sklearn.preprocessing import quantile_transform\n", | |
"import numpy as np\n", | |
"\n", | |
"class Munge(BaseEstimator, TransformerMixin):\n", | |
" def __init__(self, *args, threshold=0.001, **kwargs):\n", | |
" self.threshold = threshold\n", | |
" super().__init__(*args, **kwargs)\n", | |
" \n", | |
" def fit(self, X, y=None):\n", | |
" return self\n", | |
" \n", | |
" def transform(self, df):\n", | |
" # Get indicator variables\n", | |
" n = len(df)\n", | |
" print(\"df.shape =\", df.shape)\n", | |
" \n", | |
" df = pd.get_dummies(df, sparse=True)\n", | |
" print(\"with indicator variables df.shape =\", df.shape)\n", | |
" \n", | |
" # filter out indicators that are used by less than 0.01% of people\n", | |
" feature_cols = [col for col in df.columns\n", | |
" if 'category' in col or 'numeric' in col]\n", | |
" \n", | |
" keep = [col for col in feature_cols if df[col].sum() / n > self.threshold]\n", | |
" df = df[keep]\n", | |
" print(\"after trimming feature matrix =\", df.shape)\n", | |
" \n", | |
" # missing values => {categorical: user doesn't have feature}\n", | |
" # numeric is a bit different. Most features are positive and fairly\n", | |
" # small, so I assume 0 is a good missing value (plus it's a good\n", | |
" # missing value for linear predictors)\n", | |
" df.fillna(value=0, inplace=True)\n", | |
" return df\n", | |
" \n", | |
"class Normalize(BaseEstimator, TransformerMixin):\n", | |
" def __init__(self, *args, dist='normal', **kwargs):\n", | |
" self.dist = dist\n", | |
" super().__init__(*args, **kwargs)\n", | |
" \n", | |
" def fit(self, X, y=None):\n", | |
" return self\n", | |
" \n", | |
" def transform(self, df):\n", | |
" categories = [col for col in df.columns if 'category' in col]\n", | |
" numeric = [col for col in df.columns if 'numeric' in col]\n", | |
" \n", | |
" # make the features approximately Gaussian\n", | |
" # This is used instead of StandardScalar to help\n", | |
" # handle outliers (which there are a lot of in this dataset)\n", | |
" numeric_features = quantile_transform(df[numeric].values,\n", | |
" output_distribution=self.dist)\n", | |
" categorical_features = df[categories].values\n", | |
" \n", | |
" num_cate = categorical_features.shape[1]\n", | |
" num_numerical = numeric_features.shape[1]\n", | |
" self.features_ = {'numeric': np.arange(num_cate, num_cate + num_numerical, dtype=int),\n", | |
" 'categorical': np.arange(num_cate, dtype=int)}\n", | |
" X = np.hstack((categorical_features, numeric_features))\n", | |
" print(\"feature matrix shape =\", X.shape)\n", | |
" X -= np.median(X, axis=0)\n", | |
" return X" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.pipeline import Pipeline\n", | |
"munge = Munge()\n", | |
"norm = Normalize()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"df.shape = (97955, 40)\n", | |
"with indicator variables df.shape = (97955, 226831)\n", | |
"after trimming feature matrix = (97955, 1669)\n", | |
"CPU times: user 3min 55s, sys: 4.51 s, total: 3min 59s\n", | |
"Wall time: 3min 55s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"df = munge.transform(df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"feature matrix shape = (97955, 1669)\n", | |
"CPU times: user 2.11 s, sys: 1.02 s, total: 3.12 s\n", | |
"Wall time: 3.08 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"X = norm.transform(df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[-0.6951397 , 0.79657621, 0.39199884, ..., 0. ,\n", | |
" 1.1001425 , 0.16129644],\n", | |
" [-5.19055549, -0.50224282, -5.10132452, ..., -5.30365568,\n", | |
" 0.81683179, -5.26839371],\n", | |
" [-0.24741611, 0.36678605, -0.30126316, ..., 0. ,\n", | |
" 0.50040387, -0.49565754],\n", | |
" ...,\n", | |
" [ 0.09167847, 1.45153507, -5.10132452, ..., -5.30365568,\n", | |
" 0.81297039, -0.20749966],\n", | |
" [ 0.17133449, -0.44051127, -0.30126316, ..., -5.30365568,\n", | |
" 1.53000694, -0.49565754],\n", | |
" [ 1.17818276, -0.5542076 , 0.77329052, ..., 0. ,\n", | |
" 0.52262393, 0.51592775]])" | |
] | |
}, | |
"execution_count": 29, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"show = X[:, norm.features_['numeric']]\n", | |
"show" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<matplotlib.axes._subplots.AxesSubplot at 0x1c2036cf60>" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD8CAYAAABjAo9vAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xd4lFX68PHvmcmkhxQSSAgl1AklgURAmoiAgGJd+4IFRLGC7FpWXV38+aq4VuwiRdeyq6JioYOgQiICoUMSWoAUahqE9Jz3j4fEQHrmmXlmJudzXblIJjPPuSck95w55T5CSomiKIriPkxGB6AoiqLoSyV2RVEUN6MSu6IoiptRiV1RFMXNqMSuKIriZlRiVxRFcTMqsSuKorgZldgVRVHcjErsiqIobsbDiEZDQ0NlVFSUEU0riqK4rM2bN5+UUoY1dD9DEntUVBSbNm0yomlFURSXJYQ41Jj7qaEYRVEUN6MSu6IoipuxObELIbyFEH8IIbYJIXYJIZ7TIzBFURSlefQYYy8GRkopzwghLMA6IcRSKeXvOlxbURRFaSKbE7vUCrqfOfel5dyHKvKuKIpiEF3G2IUQZiHEVuA4sFJKuUGP6youYPtX8EYfmBmk/bv9K6MjUtyJ+v1qFl0Su5SyXErZD2gPDBRC9LnwPkKIe4UQm4QQm06cOKFHs4rRtn8FP06DvCOA1P79cZr641P0oX6/mk3XVTFSylxgLTCulu/NkVL2l1L2DwtrcH294qzKSyEvAzI2w7InobTw/O+XFsLq/zMmNsW9rP4/9fvVTDaPsQshwoBSKWWuEMIHGA28bHNkim22f6X9AeSlQ2B7GPUsxN5c9/3LiuHMMTh9VPs47/OjcPqY9m/BSRqcQsk7Aru+g07DwF+9iCvNcGz3uZ56LfLSHRuLC9JjVUwE8IkQwoz2DuArKeVPOlxXaa7Kt7CVvZ28I/D9Q5C+EUK61J64C3NqXkeYwL+t9hEYCZHxEBCuffiHw0/T4czxWgIQ8PVd2qehPaDTUIgapn0EhNvrWSuurqQAdn4LSZ9ov6t1CWzvuJhclB6rYrYDcTrEouhl9XM138KWF8Mfc7TPTZZzybkttO4KnYacn7AD2mr/+oWCyVx3OyVnzn8BAbD4wPg3oHU3OLQO0tbDjoWweYH2/ZCuEDVU681HDVV/pApkboHNn2i/JyWntc7AmBfA4gsrnqr5+zXqWeNidRGG1IpR7KQoD5I+reetqoDH9oNPMJh0mF6pHNqpa8inwwAYNgPKy+Dodji0Xkv0u7+HpP9o9wnqpPXkOw3VEn1QJxDi/HaaOqykOL+iPO3/Nek/2u+Ghw/0vg7i74SOg/78HfDyP/d/f0TrkFz9lvq/bwShLUN3rP79+0tVBExHp/bDhg9g6xdaL9rspfXQLxTYAWbsdHx8F6ooh2O7IG2dluwPrf9zKKhV+3M9+nPDNxmba39XoP7AXY+UcGSD1jvf9R2UFUJ4jJbMY24Cn6C6H/vHR7DkUZj6K0T0dVzMTkYIsVlK2b/B+6nE7qKkhANr4ff3Ye8KMFugzw1w8X1wMtW1kmFFBZzYo/XmD62DQwlQcG5JrDCBrKj5GGd5kVIadjYbtv1X652fSAZPf4i5UUvo7eJqvkOrTWEOvGqFi+6CK/9t95CdVWMTuxqKcTUlZ2HHV/D7B1oy9AuDS5+A/pO1sXGAdv20f11l+MJkgra9tY+L79VetE6maj36xX+r/TFqZYRzq6iAtN+0idA9P0J5CUT2h2veht5/0YZYmsInGHpepf3uj3kePLzsE7ebUIndVeRlwMa52iRkYQ6Ex8J172u99Np+yWNvdt5E3hAhIMyqfax7o/Zlb2rS1Vh1zXucPgZbP9d65zkHwTsQLpoEF92pvXDbot8E2PkNpCyB3tfr8zzclErszu7IRvj9PW3CEQnR4+Hi+7WVLI15C+vqRj1bc1jJQ62MMFSty2kfhMR34dhOqCjT5khGPAm9rtGGAfXQZYQ2B7PlM5XYG6ASuzMqK9ES+Yb3tclDr0AYdD8MvBeCOxkdnWNduPIGCd1Gue67EXdQ247Q8hJtdcvgh7Sx89Bu+rdrMkO/2+C31yA/E1q1078NN6ESuzMpOAWb58PGeXA6S1sLfuWr0Pe2po9JupPqw0pfT4KUpeoP20h1zW9IqY1/21O/v8Kvr2iTsZf83b5tuTCV2I1w4fhk/8mQfQB2fA1lRdB1pDbJ1HWUPuvN3cnof0HyT7DmBbj2XaOjaVlO7Yd1r1NnSQlHzHuEdNE2t235DIb9rWUMRzaDSuyOVtv45OrnQFggfqK2XLFNtLExOrPgKG1IKvFdba4hvEYhUUVvR3dqwx+7F4HZE7pcBocTtU5IJUfuCI2bAIvuh8O/Q6fBjmlTDw7caKe6g45W2/gkQEAbuPpNldQbY/ij2mqLlWoC1a6ObIQvboUPhsLelTBkGjyyA+5YpL2jDOwACO1fR+6R6HWtthZ+y2eOaU8PDi5BrHrsjlbX+GR+pmPjcGU+wXDp47D8Kdi3CrqNNjoi9yElHPxF66Ef/FX7WV/2NAy8R/u8kpHLaT39tFUxO7+FK152jfmn+koQ2+HnqHrsjlbXtmm1LrtpBkzRhmVWPKuVKFBsIyUkL4G5o+E/18KJVK0Q1yM7tRfR6kndGcTdDqUF2vCQK6irQ2enjXYqsTtS+iYozNO2yVenKtY1nYcXjJ4Jx3dpKySU5qko16oqvj8U/nebVsrhqjdg+jYY8pDz9oY7DITW3WHL50ZH0jh+dZxLYKcOnUrsjlJwEr66A4Law/jXjBufdCe9roP2A+Dn/6fV8lYar6xE2x36Tn/45m5tU9H1H8LDSdoqLYu30RHWTwhtEvVwgrZax9n5tal5mx07dGqM3RHKy2DhJDh7Cu5eoVWn6z/Z6KhcnxAw5v/B/LHaKplLHzc6IudXclZL6AlvQX4GRPSDmz+F6Ktcb2lt7K3aGPXWz537HW/mFji+E3pdDxmbHLIqRiV2R/j5eW0i6tr3WnTJUbvoOAh6XgPr3tR2PFYWQmvJaltW12OsVmso8T04e1Lb8n/NW9peCVddC94qArpdDlv/q03w1ncojJF+fVVbxXXNW+DdyiFNuthLtAva/QOsf1MrhBQ3weho3NPomVr9+bUvGR2J8WpbVrfofnjFqiX7dv1g0jKYtERbTeSqSb1S3AQ4nQn71xgdSe2O7dY21F18n8OSOqjEbl8n98KiB6BdvLYsS7GP1l21VTJJn8DxZKOjMVZty+oqykAA9/4CE79xrU09DelxBfiEwFYnXdP+22vamvuL73Nosyqx20vxGfhyInh4ws3/UfWj7W344+AZAKv+ZXQkxqpr+VxZ0Z91+t2JhyfE3gLJi7UDPZzJyX2w61sYcDf4hji0aZXY7UFK+OFh7bCIG+dDUAejI3J/fq1h+N8hdRkc+MXoaIxT1/I5d94nETdBqy65Y6HRkZxv3RtaCYbBDzm8aZXY7eH397VX6pHPaDWkFccYOBUCO8KKf2on+LREPcbWvM3d90mEx2iLErZ8anQkf8o5BNv/px3l51/LUkc7c53Evv0reKMPzAzS/rVTjQWbHUrQEkv0VTBshtHRtCwWby2BHd2uHaHW0pzaD9u+hJCu53roLWifRL+J2v971najI9Gsnw0Irb6OAVxjuWNtFRF/PPcDc6Zf2NNH4eu7IKQzXPee6684cEV9boDf34XVz2vFovQ6vcfZlRXDwsnakr87vm95w38xN8KKp7U17RGxxsaSn6W9e4ibAIGRhoTgGj32+groOIvyUvjqTig+Dbd8pq1bVRzPZNI2LeWna0NiLcWqmZC1VatR39KSOmiTk9HjtU5gWYmxsSS8rZVqGPqIYSG4RmJ3cAGdZlnxDBz5XStn2qan0dG0bFHDwHol/Pa6VsrB3aUs1c7FHTgVel5ldDTGiZsIhdmQutS4GApOwqb52khCSGfDwnCNxF7XjL7FB4ryHRtLbXYs1M4nHfSA9pZQMd7o56D0LPzi5vsH8jK0DUjhMXC5E72DNUKXy6BVpLF12hPf1ZaWDvubcTHgKol91LM1x0pNFm045sNLtKqJRjm2W1va2HGw+sNyJmE9tBUJm+ZrG8XcUXkZfDNFG3q48WPnL9xlbyazdj7wvlXaOLejFebAHx9B7+u03z8DuUZij71Zm9mvXhHxuvdg8nJtWdv8sdoOL0fX5S7K0zYheQXATR+D2eLY9pX6jXgSPHy08Wd39Ou/teqGV70Bod2MjsY59PsryApjSjlvmAMlp+GSRx3f9gVcY1UM1H1iy32/wU8ztInUA2vh+jlacSB7q6iA7+6H3ENw508QEG7/NpWm8Q+DYY9oRdgOJUCnIUZHpJ+Dv8Iv/4a+f4W+txgdjfNo3RU6DtFWxwyb4biVacWntXkO65VOcQ6va/TY6+MTpO3uvOYdbUjm/SHaZJK9rX8TUhZrKzDcqfaGuxn0gDbuuvxp99m0VHASvrkHWneDK18xOhrnEzcRTu2DIxsc1+bGeVCU6xS9dXCHxA7aq3L87TD1V22i9b+3wuJHaz80Wg/712i9wD43OLy4j9JEnr4w8p+QmaTtBnZ1FRXw3X3aeO5NC5z3hCMj9boWLH6Om0QtOQuJ72iTt+0vckybDbA5sQshOggh1ggh9gghdgkhpusRWLOEdocpq2DQg7DxI/holP7V/nKPaCfOhFq1cX+1Ccn5xd6irRpZ/Zy2kceVJb4D+1bC2Be056TU5OUPfa6HXd855mStpP9oRwoOf8z+bTWSHj32MuDvUsqewCDgQSFELx2u2zweXjDuRZjwDRQchzmXam+TpLT92mXF2vF25aXaJiTVW3INJrM2ZJZ7GP6YY3Q0zZe+WXtx6nm1VqZYqVu/iVByBnZ/b992yoq18gEdh0DUUPu21QQ2J3YpZZaUMunc56eBPYAx+2ir6z4a7k/QTopZ/Ddt9YqtZT2XPq69pb/+A7UKwdV0GaGdtvPrK85X3rUxivK04xUD2mmb4NQ7xfp1HKTVzLH3Yddbv9AO+hjuHGPrlXQdYxdCRAFxgANnLerh3wYmLNR6a6nL4YNhkLaueddK+hQ2f6xtPIger2uYioNc/n/a6oVfXWzCUUr4YZq20/rGeeATbHREzq/ysOtD6yD7gH3aKC+Fda9rB+l0HWmfNppJt8QuhPAHvgEekVLW2A4qhLhXCLFJCLHpxIkTejXbMJMJhjwMU1aChzd8cjX8/IK2uaOxMrfC4r9rvb6R/7RXpIq9te0Fcbdrm0js9cduD5s/ht2LYNQz0GGg0dG4jr63gTBpvWp72LFQG94b/pjTvYPSJbELISxoSf1zKWWtSw+klHOklP2llP3DwsL0aLZp2sVpq2b6/lXb2PHxlVrN5IaczYavbge/MLhhnvMemKs0zmVPaYcfrHrO6Ega59guWPYPrUc4xLh1CS6pVTvtsO6tX+i/ebGiXNsU2bYP9Bin77V1oMeqGAHMA/ZIKV+3PSQ78vKH697VEvTxPfDBJbCzniVwFeXalu3TR7Xj7fxCHRerYh8B4TB0mtYDPvKH0dHUr6QAvp4EXq3g+g+1d59K08RNgPwMbfOinnZ/D6f2wiV/d8r/Fz0iGgrcDowUQmw993GlDte1n5gbtR2rYT20CanvH6x9WdQvL8P+1XDFv51mfaqigyEPg3+4diCKHqul7GXpE9rxin+ZY8gpPG7BeqU2J6HnmnYptd566+7amnknZHNJASnlOrQz0F1LcBRMWgprZ2n/SYc3aBNTJ1K08gR56YCEjkO1YlKK+/D0g5FPa8Xb9vzgnH+cOxZqhzVc8nfoepnR0bguDy+IuVmbpzibrc+h0qnL4NhOuO4Dpx2adb73EI5ktmgTUnf+oPXY51wGix7QTmjiXE8uMwl2fG1omIod9JsAbXrByn8ZfzDDhU7thx8fgQ6DYMRTRkfj+uImQnkx7PzG9mtJqa2qCurk1CW6W3Zir9R5ONy/XptUqyg9/3tlTnZSk6IPkxkufx5yDsKmeUZH86fqR9zdMBfMrlOnz2lFxGq7dPUYjjmwBjI2awXGnLiaq0rslXxDtAL5tXGmk5oU/XQbpdX3+OVlKMw1OhrNquda9hF39hJ3u/ZzPbrTtuv8+qq2SazfX/WJy05UYq+urpOa6rpdcW1CwJjntaT+22tGRwMpy7SDuFv6EXf2EHOT9o58qw07UdPWw6H1MHS6NnbvxFRir662k5osPtrtinsKj9F6Xxs+aNy+BnvJy4BF96kj7uzFN0RbIbP9y+bPqfz2qrafJf4OfWOzA5XYq6vtpKar36r9gA/FfVz2NFRIeG8QzAyCN/pop907ijrizjHiJsLZU9qqlqZK3wz7f4bBD2mloJ2cmpm5UF0nNSnu69B6EFI7/Bq0VVE/TtM+d8TvQuURd9fPUcXl7KnrSAiI0IZjel3TtMf+9ip4B8GAu+0Tm85Uj11RVv8fVFxQO6jUQauh1BF3jlN52PXeFdpu8sY6ugNSlmincXkF2C8+HanErih1rXrKOwJf3AKrn9dKT5xI1bfmiDrizvH6TTh32PX/Gv+Y314DzwC4+F77xaUzNRSjOL3FBxYzO2k2RwuOEu4XzvT46YzvomPp5MD25zalXcDiq1Xv27fqzx69hze06Qlte2sFoNr20T5v6o7G6kfcTVyoDm1xlNBu0HGwtqZ96PSGqzKeSIVdi7R16y5ULlkldsWpLT6wmJkJMykq1/YYZBVkMTNhJoB+yX3Us9qYevUzci0+cPVsbYy9rFgrNXFsl7aV/NhObWli9Q0vAe200+mrJ/zW3WpuMNr+1bmSFedeSPrepo64c7R+E+CHhyB9Y8NlkNe9rr2YD37QMbHpRCV2xanNTppdldQrFZUXMTtptn6JvXKCtLJGUGB7LdlX3u7hpe1ejIg9/3Gnj51L9Lv+TPr71/y5e9nsBWFWLcmH99GGXn5/X9vNXGn3Im1ST03YO07v67QCa1s+qz+xZx/UXogvvs/lKruqxK44taMFtU9y1XV7szVnNVRAW+2j26g/bysr0cq5Ht35Z9Lfvxq21XHYQ+UkrUrsjuMVoCX3nd/CuJe0onC1Wf+mNuE65GHHxqcDldgVpxbuF05WQVattzslD89zwzG9gWqrXM6cgFe7U1VcrjpVssLx4iZqyx73/Ah9b635/bwM7bzU+DugVYTj47ORWhWjOLXp8dPxNHued5uX2Yvp8S52mpB/mCpZ4Uw6DoaQLnUXBkt4C5DaBKsLUoldcWrju4znqs7n102JC4vTd1WMo7hRyYrFBxYzZuEYYj+JZczCMSw+sNjokJpGCG0SNe03bSy9ujPHtfrtsbdCcCdDwrOVSuyK07OYLfhb/Nl2xzZu7nEzm45t4nD+YaPDajo3KVlRuVIpqyALiaxaqeRyyb2uw64T34HyEm2Jo4tSiV1xesnZyVhDrJiEifv63ofFbOHtLW8bHVbzxN4MM3bCzFztXxdL6lD/SiWXEhiplW2uftj12WzYOA96/8WlyzuoxK44tfKKclJzUokOiQYgzDeMiT0nsixtGbtO7TI4upbJYSuVHCFuIuSnw8FftK83fAAlZ7QjCV2YSuyKUzty+giFZYVYg61Vt03qM4kgryBmb3axHqKbqGtFktOuVKqP9UqtuNeWz6EoT0vs0VdB215GR2YTldgVp5acnQxAz9Y9q24L8AxgSswUErMS+T3rd6NCa7Gmx0/H23x+aWFvs7frrVQCrURyuzjYuRBmddSSe0Rfo6OymUrsilNLzk7Gw+RB18Cu591+a/SthPuF8+bmN5GylrXhit2M7zKeSX0mVX3tYfJg5pCZrrlSaftXWsnk6ta97th6/HagErvi1JJzkuka2BXLBQcHe5m9eLDfg+w6tYsVh1YYFF3LZRImBIIbut8AwJhOYwyOqJlW/59WC6g6R5VstiOV2BWnlnwquWri9EJXd7mabkHdeHvL25RW1mdRHCIxM5FerXsxMHwgZRVlHMg7YHRIzVNnyWbX3g2sErvitE4WnuRU0ak6E7vZZGZa3DQO5R/iu73fOTi6lutMyRm2ndjGkHZDqv5vKudCXI6b7gZWiV1xWpXJwhpirfM+IzqMIK5NHB9s+4DC6lUTFbvZeHQj5bKcwe0G06lVJ7zN3qTkpBgdVvO40W7g6lRiV5xWZWKvq8cOIITgkfhHOFF4gs/3fO6o0Fq0hMwEfDx86BfWD7PJTPfg7qRku2hid5PdwBdS1R0Vp5WcnUykfyQBnvWfMxnfNp5L21/K/B3zuanHTQR6BToowpYpMSuRAeEDqia0rSFWVqStQEqJaOhEImfkhgfYqx674rRSslPq7a1XNy1+GmdKzzB3x1w7R9WyZZzJ4FD+IYa0G1J1W3RwNPkl+a6589RNqcSuOKWzpWc5lH+o0Ym9R3APru56NV/s+UIlGDtKzEwEYHDE4KrbKudAXHYC1Q2pxK44pdScVCSy0Ykd4MF+DyKRvLv1XTtG1rIlZCbQ1rctnQM7V93WI7gHAkFyjkrszkIldsUpNWbi9ELt/Ntxi/UWftj/A/tz99srtBarvKKcDVkbGNJuyHlj6b4WXzq26ui6E6huSJfELoSYL4Q4LoTYqcf1FCU5O5lAr0Da+rZt0uPujb0XHw8f1ysh6wJ2n9pNfkk+g9sNrvE9a7BVJXYnoleP/WNgnE7XUhSSs7Udp01dZRHsHcxdve9izZE1bD2+1U7RtUwJmQkIBIMiBtX4XnRINOln0jldctqAyJQL6ZLYpZS/Atl6XEtRyirK2Juzl+jgxg/DVHdHrzto7d2aNza/oQqE6SgxK5HokGiCvYNrfK9yAjU1J9XRYSm1cNgYuxDiXiHEJiHEphMnTjiqWcUFpeWlUVJRUu+O0/r4WnyZ2ncqSceT+C3jN52ja5kKSgvYdnzbecscq3P50gJuxmGJXUo5R0rZX0rZPywszFHNKi5oT/YeAHqG9GzgnnW7sfuNdAjowJtJb1JeeeyZ0mwbj26kTJbVmdjDfMII8Q5R4+xOQq2KUZxOSnYKniZPogKjmn0Ni9nCw3EPszdnL0sOLtEvuBYqMTNRKyPQpl+t3xdC0CO4h+qxOwmV2BWnk5yTTPfg7niYbKt4MTZqLD1DevLOlncoKS/RKbqWKSEzgYvaXoSn2bPO+0SHRLMvd58qoewE9Fru+F8gEbAKIdKFEHfrcV2l5ZFSVq2IsZVJmHgk/hEyCzL5KsW1T8QxUtaZLNLy0+ochqlkDbFSWlFKWl6aYwJT6qTXqpjbpJQRUkqLlLK9lHKeHtdVWp5jZ4+RV5ynS2IHGNxuMBeHX8yc7XM4U3JGl2u2NIlZNcsI1KZyFZMajjGeGopRnEpzdpzWRwjBIxc9Qk5xDp/s/kSXa7Y0CZkJtPFpQ9egrvXeLyowCk+Tp5pAdQIqsStOJTk7GYE2EaeXPqF9uLzT5Xyy6xNOFp7U7botQXlFOb9n/c7gdoMb3CzmYfKge3B3VTPGCajErjiV5OxkOrXqhK/FV9frToubRkl5CXO2z9H1uu4uOTuZvOK8WssI1CY6JJqU7BS1McxgKrErTiU5O7nZG5PqExUYxfXdr+fr1K85cvqI7td3VwmZCQC1lhGojTXESm5xLsfOHrNnWEoDVGJXnEZ+ST4ZZzJ0G1+/0P1978dDePDOlnfscn13lJCZQM+QnrT2ad2o+1uDtRdlNc5uLJXYFadRmQzsldjb+LZhQs8JLDm4RK3caISzpWfZemIrg9o1rrcOVM2NqJ+vsVRiV5yGvRM7wOSYybTybMWbSW/arQ13senYJsoq6i4jUBt/T386BHQgJUf12I2kErviNJKzk2nt3ZpQn1C7tdHKsxVTYqawPmM9G49utFs77iAhMwFvszdxbeKa9LjKCVTFOCqxK04jOTuZ6Nb2661Xui36Ntr6tlVlfRuQmJnIRW0vwsvs1aTHWYOtHD59mILSAjtFpjREJXbFKZSWl7I/b3+za7A3hbeHNw/0e4AdJ3ew6vAqu7fnio4WHOVA3oFGL3OsrnIoTdVmN45K7IpT2J+3n7KKMruOr1d3Tddr6BzYmbeS3qKsoswhbbqSxEytjEBTxtcrVS5XVROoxlGJXXEKe05pNdgdldg9TB5Mj5tOWn4ai/YtckibriQxM5EwnzC6BXVr8mPb+rYl0CtQjbMbSCV2xSmk5KTg4+FDh4AODmtzZMeRxIbF8v7W9yksK3RYu86uQlaQmJXYqDICtRFCEB0crXrsBlKJXXEKydnJ9AjugdlkdlibQggeiX+E44XH+WLPFw5r19ntyd5DbnFus8bXK1lDrOzL3aeGuQyiErtiuApZQUp2isOGYaobED6ASyIvYd7OeeQV5zm8fWdUOb7e2DICtYkOiaa4vJhD+Yf0CktpApXYFcNlnMngTOkZQxI7wPT46ZwpOcO8neoYAdASuzXYatN+AjWBaiyV2BXDOWLHaX2sIVb6hvVlwc4FxH4Sy5iFY1h8YLEhsRjtbOlZko4nNWs1THWdAztjMVnUBKpBVGJXDJecnYxZmJu1AkMPiw8sZk+2tipHIskqyGJmwswWmdw3H9tMWUVZk+rD1MZistAtqJvqsRtEJXbFcMnZyXQO7Iy3h7ch7c9Omk1xefF5txWVFzE7abYh8RgpITMBL7MX8W3ibb5WdEg0KTmqNrsRVGJXDGevGuyNdbTgaJNud2eVZQT0eJG1hljJLsrmROEJHSJTmkIldsVQOUU5HDt7zCGlBOoS7hde6+1BXkEOjsRYxwqOsT9vf4OHVjeWqs1uHJXYFUNVHV7tgOJfdZkePx1v8/k9VIEgrziPX9N/NSgqx0vM0pY52rJ+vbrKd2GqhK/jqcSuGKqyN1fZuzPC+C7jmTlkJhF+EQgEEX4RPDPoGawhVmasmcH6jPWGxeZICZkJtPZurdtB4gGeAUT6R6oJVAN4GB2A0rIl5yTT1rctwd7BhsYxvst4xncZf95tY6LGMGXFFKb9PI13Rr2jW0/WGVXICjZkbWBIuyHNKiNQF1Wb3Riqx64YKvlUMj1DehodRq0CvQL56PKPiAqM4uGfH2ZD1gajQ7KblOwUsouybV6/fiFXaUxqAAAgAElEQVRriJVD+Yc4W3pW1+sq9VOJXTFMUVkRB/MPGroipiFB3kF8NOYjOgR04KHVD7ntqUsJmQmAbWUEahMdHI1EqtrsDqYSu2KYfbn7qJAVhu04bawQ7xDmjplLpH8kD65+kE1HNxkdku4SsxLpHtydMN8wXa9bNYGqhmMcSiV2xTCVuz2dPbEDtPZpzdyxcwn3C+eB1Q+QdCzJ6JB0U1hWSNKxJIZE6DsMAxDhF0GAZwDJOWoC1ZFUYlcMk5Kdgr/Fn0j/SKNDaZRQn1DmjZlHW9+23L/qfrYe32p0SLrYfGwzpRWluo+vw7na7CHRpGaroRhHUoldMUzljlM9V2HYW5hvGPPGziPMN4z7Vt3HthPbjA7JZomZiXiaPIlva3sZgdpYg62k5qRSXlFul+srNanErhiivKKc1JxUp10RU582vm2YN2YeId4h3LfyPnac2GF0SDZJyEwgvm283Wr1RIdEU1RexKHTqja7o6jErhji8OnDFJYVOvWKmPq09WvL/LHzCfIKYurKqew6ucvokJrl+Nnj7MvdZ5dhmEqVcyhqAtVxdNmgJIQYB8wGzMBcKeUsPa5b3aItGbyyPIXM3ELaBfnw2Fgr18XpPzbriHbcpQ1b2mlKDXZn/XmF+4Uzf+x8Ji2fxD0r72HumLn0at1L93aaqilt/J71O9D0MgJNaaNLYBc8TB4kZydzRecr7NZOczn730pz2NxjF0KYgXeBK4BewG1CiPp/u5to0ZYMnvx2Bxm5hUggI7eQJ7/dwaItGXo245B23KUNW9tJzk7Gw+RB18CudmujsWxpI8I/gnlj5+Fv8eeeFffUu33eGZ9LQmYCId4hTSoj0NQ2LGatNntTe+zO+PNy9nYqCVtrJQshBgMzpZRjz339JICU8qW6HtO/f3+5aVPj1wIPnfUzGbk1T5H3NJuI66hfBb4th3MpKa+wazvu0oat7RyyzKZc5NOl5Bm7tdFYerRRIk6S5vkKkhI6lfwdb9neLu00pCltSCpI9XoM/4peRJbebZc2KmVYFnDGtAtr8at2baepjP5biQzyYf0/Rjb6OkKIzVLK/g3dT48x9kjgSLWv08/ddmFA9wohNgkhNp040bT6zJm1JHWg1h+ULeq6np7tuEsbtrZTZDqCV0UHu7bRWHq04SlDiSr5OwJPDnm+TpGo2RNztudSLDIoF6fxK2/aBHZznod3RQfKRT5lNP7AcGf7edmjnbpym630GGOvba1ajbcBUso5wBzQeuxNaaBdkE+tPfbIIB++nKpfYaa63hno2Y67tGFLOycLT3LZV/ncN/gSJvaqPx5X+3kdzo9j0rJJ5Pq8zfyx8+ka9OdQk7M9lwU7k3l9M3w64Xba+LaxSxuVNh71YPLyr/jbVQEMjWzcc3XUz+tYRQJeYcsRllxkaRDFJ8bS1jTEIX8r7YJ8dGujOj167OlA9a5XeyBTh+tWeWysFR+L+bzbfCxmHhur74oKR7TjLm3Y0s6eU43fcepqP6+OrToyb+w8TMLE3cvv5kDeAbu0U5emtJGQmUC3oG5NSupNbaNS5Rh+U0r4OuLnNWZgBt4R32LyzEUIMHnm4h3xLWMG6jv27ai/yUp6JPaNQHchRGchhCdwK/CDDtetcl1cJC/9JYbIIB8E2iv2S3+J0X1G2RHtuEsbtrRTefBCY5Y6uuLPKyowinlj5wFw9/K7OZh30C7t1KaxbRSVFZF0LKlZpYib8zwCvQJp59euSROojvh5rc/+FGEqPe82YSplffanurUBjvubrGTz5CmAEOJK4E205Y7zpZQv1Hf/pk6eKu7l0V8eZefJnSy7YZnRodjV/tz9TF4+GbMws2DcAjq16mR0SFUSMhKYumoq749+n2GRwxzS5rSfp5GWn8YP1+na77NJ7CexyJojxwgE2+/cbkBE9XPk5ClSyiVSyh5Syq4NJXVFSc523hrseuoa1JW5Y+ZSVlHG5OWTOZx/2OiQqiRkJmAxWbio7UUOazM6JJq0vDSnqs1e13m3dd3uKtTOU8WhCkoLOJx/2GV3nDZV9+DuzB07l5LyEiYvn8x/dv2HMQvHEPtJLGMWjmHxgcWGxJWYlUh8m3h8POwzeVcba7AViWRf7j6HtdmQ6fHT8TCdv4bEw+TB9PjpBkWkD5XYFYfam7MXiXSJUr166RHcg7lj5pJfnM8rm14hqyALiSSrIIuZCTMdntxPnD1Bak6qw4/6c8bDrcd3GU+EXwQeJg8EAk+TJx7Cw64lFhxBJXbFoVypBruerCFW/Dz9atxeVF7E7KTZDo2lsoyAo5NXpH8k/hZ/p6oZk346nSOnj/BgvwfZfud2vrzqS0orSnl7y9tGh2YTldgVh0rJTiHIK4i2vm2NDsXhThWeqvX2owVHHRpHYmYiId4hDh8OE0JgDbE2acmjvS1PWw7AuKhxAHQL7sZt0bexMHUhu0/tNjI0m6jErjiUK9Zg14szTNRJKUnITODiiIsxCcf/+UeHRDtVbfZlacuIDY2lfcCfJSAe6PcAwd7BvLThJfRYNWgEldgvsPjAYqeY3HJHpRWl7M3ZS3RwyxqGqTQ9fjre5vNrngsEU2KmOCyG1JxUThWdMmwM2RpspbCskCOnjzR8Zzs7kHeA5OxkxnUed97tAZ4BPBL/CFtPbOWnAz8ZFJ1tVGKvZvGBxcxMmGn45Ja7SstLo6SihOjWLTOxj+8ynplDZhLhF4FAEOIdgkmYWJi6kLzixtdQsUVVmd4Ix06cVqqcW3GGM1CXH1yOQDA2amyN713b7VpiQmN4ffPrnCk5Y0B0tlGJvZrZSbMpKi867zYjJrfcVeXYakvtsYOW3FfcuILtd27nl1t+4Z1R77Avdx/3r7rfIQkkITOBroFdaetnzBxH16CueAgPwydQpZQsTVtK//D+tZZUMAkTTw58kpOFJ/lw+4cGRGgbldjPkVLrodfG0ZNb7iolOwUvsxdRgVFGh+I0hkUO47VLX2PPqT08sPoBu27eKS4vZvOxzQ5f5lidp9mTzkGdDZ9ATc1J5WDewapJ09rEhMVwfbfr+Wz3Z+fV/HEFKrEDO0/uZNLySXV+39V3oTmL5Oxkugd1r7EhpKW7rONlvDz8Zbad2MbDPz9MUVlRww9qhqRjSRSXFxua2EF7x5aanWpoDEsPLsUszFze6fJ67zc9fjo+Hj68/MfLLjWR2qITe+aZTJ749QluW3wbB/MOcl3X62pMbnkI19+F5gyklCTnJLeYHadNNSZqDC8Me4GNRzfyyJpHKCkv0b2NxMxEPEwe9G/bYKkRu7KGWDleeLzO5Z/2JqVkWdoyBrUbRLB3cL33be3Tmgf6PUBCZgI/H/nZQRHarkUm9vySfF7f/DpXf3c1qw+v5p6Ye1h8/WKeH/b8eZNbPh4+lMkycopyjA7Z5R07e4y84rwWtzGpKa7qchXPDXmO9Znr+fvav1NaXtrwg5ogITOBuDZx+Fp8db1uU1Udbm3QDtQdJ3eQcSaDK6Iad/7qLdG30C2oG69sfMVu76b01qISe2lFKZ/v+Zzx347n450fM67zOH66/iemxU/D39MfOH9yK+G2BEZ3HM3LG19m0b5FBkfv2ppSg70lu7779Tx98dOsTV/LE789QVlFmS7XPVl4kpScFKfYKm8NPldawKAJ1KUHl2IxWRjZsXFH0llMFp4c+CQZZzJYsGuBnaPTR4tI7FJKVh9azfXfX8+sP2ZhDbby5VVf8sKwF+odP/cwefDy8JcZHDGYfyX8i1WHVjkwaveSnJOMQDTp0OSW6tboW3ms/2OsPLSSp9c9rctmnqpljgaPrwMEeQcR7hduyARqeUU5y9OWc0nkJQR4BjT6cQMjBjKm0xjm7ZhH5hldzxGyC7dP7DtO7OCuZXfxyNpH8BAevDvqXT4a8xE9WzeubKyn2ZM3L3uTPqF9ePzXx0nMTLRzxO4pJTuFTq06GT4M4Cru6H0H0+Ons+TgEp5LfI4KadsZnImZiQR5BTlNueTo4GhDeuxJx5M4UXiCKzo3bhimukf7P4pA8Oqmxh/IbRS3Tezpp9N5/JfH+euSv3Io/xDPDn6WhdcsZHj74U3ezu5r8eW9Ue8RFRjF9DXT2XZim52idl/J2clqGKaJpsRM4b6+9/Hdvu94ccOLzV6VIaUkMTORQRGDDCkjUJseIT04mH/Q4WPWyw4uw8fDh+Hthzf5sRH+EUyJmcLKQyur3gE5K+f4X9ZRXnEer216jWsWXcOaI2uYGjuVxX9ZzE09brJpmV2gVyBzLp9DqE8o96+6n9QcY5druZL8knwyzmSoFTHN8EDfB5jUZxJfpnzJK5teaVZy35e7jxOFJ5xifL1SdEg0FbKC/bn7HdZmaUUpKw+tZET7Ec1+53hXn7to79+elza8RGmFvpPbenKbxF5aXspnuz9j/Hfj+WTXJ4zvMp6frv+Jh+Iews9Ss1xqc4T6hPLRmI/w8fBh6sqpTnUijjOrfMuteuxNJ4RgRvwMJvScwKe7P+WtLW81ObknZCYAzjG+Xqly97Ejx9n/yPqDnOKcGrVhmsLL7MXjAx7nQN4B/pf8Px2j05fLJ3YpJSsPreTa76/l5Y0v0zOkJ19f/TXPD33eLtumI/0j+ejyjyirKOOeFfdwrOCY7m24m6pSAiqxN4sQgicGPMFNPW5i7o65Td7inpiVSOfAzk610S4yIBI/i59DE/vSg0sJsATYfMbriA4jGBo5lPe2vsfJwpM6Racvl0nstVVd3HZiG3csvYO/rf0bXmYv3h/9PnMun2P3t/xdgrrwwegPyCvJ496V96p17g1Izk4m1CeUUJ9Qo0NxWUII/jnon1zb9Vre3fou83fOb9TjisuL2Xx0s1MNw4BWi8UabHXYWvaS8hJWH17NyI4j8TR72nQtIQT/GPAPp64j5RKJvbaqi0+te4qJSyaSfiadmYNn8vXVXzMscpjD6nz3Du3N2yPfJuNMhsMKOLmqlOwUNb6uA5Mw8dyQ57gi6gre2PwGn+/5vMHHbDm+haLyIsOqOdbHGmIlJTvF5hU/jbEuYx1nSs80azVMbaICo7i91+0s2reI7Se263JNPblEYq+t6mKFrMDf4s/i6xdzQ48bDKk/MiB8AK9d+hop2Sl2rfHhykrKS9ifu79FV3TUk9lk5oVLXmB0x9HM+mMWX6V8Ve/9K8sIDAgf4KAIGy86JJqzZWdJP51u97aWHVxGsFcwAyMG6nbNqbFTCfMJ46UNLznkxakpXCKx11VdsaC0wPB10Zd2uJT/N+z/sfnYZh795VGnnik3wv7c/ZTJshZbg90eLCYL/x7+b4a3H87zvz/P9/u+r/O+iZmJ9AvrZ/jfSW0qd6Dae5z9bOlZ1qav5fJOl2MxWXS7rp/FjxkXzWDnqZ1OtzPdJRK7MxwpVp/xXcbz9MVP80v6L/xz3T+d7tXbSKoGu31YzBZeH/E6gyMG82zCsyw5sKTGfU4VnmJP9h6nWg1TXdegrpiF2e6J/Zf0XygsK7RpNUxdrupyFXFt4pidNJv8knzdr99cLpHYaztSzNvs7VRVF2+JvqVqp6Atm0ncTUpOCj4ePnRs1dHoUNyOl9mL2SNnE98mnqfWPVWj5MWGrA0ATjdxWsnbw5vOgZ3tvidk6cGltPFpQ3ybeN2vLYTgyYFPklOUw3tb39P9+s3lEon9wiPFIvwimDlkJuO7jDc6tPPc3eduJvXWNpO8veVto8NxCntO7cEabHWaHY/uxsfDh3dHvUtMaAyP/foYvxz5pep7CZkJBHoFOk0ZgdpYQ6x27bHnl+SzLmMdY6LGYDaZ7dJGz9Y9uanHTfwv+X/szdlrlzaaymX+2qpXXVxx4wqnS+pwbjPJRTO4ofsNfLTjIz7e+bHRIRmqQlaQkqNWxNibr8WX90a/hzXYyoy1M3hj0xuMWTiG7/d/T3FZMcvSlhkdYp2ig6M5dvaY3ZYM/3z4Z0orSnVbDVOXh+Mext/Tn5f+eMkp3q27TGJ3FUIInhn0DGOjxvLa5tf4JvUbo0MyTMaZDApKC9TGJAcI8Azgw8s/pLV3a+bvml91zGNReZFTH8he+aJvr/Xsyw4uI9I/kpjQGLtcv1KQdxAP93uYjUc3svzQcru21RgqsduB2WTmpWEvMSxyGM8lPufUPSZ7qnyL7cxDAe4k0CsQSc3eojNvpKlK7Hao9JhdlM3vWb8zLmqcQ/a33NjjRqJDonl146t2Pbu2MdThk3ZSuWrhvpX38eRvT+Jv8bd5K7OrSc5OxizMdA3qWud9SktLSU9Pp6hI7QGojbe3N+3bt8diadwyveNnj9d6u7MeyB7iHUIb3zZ2GWdfdWgV5bLc7sMwlcwmM08OfJI7l93J3B1zmRY/zSHt1kYldjvy8fDhnVHvcPfyu5mxZgYfXv4h8W31n5l3VinZKXQO7Iy3h3ed90lPTycgIICoqCiH7Rp2FVJKTp06RXp6Op07d27UY8L9wquGYS683VlZg+0zgbr04FI6B3Z26OEu8W3jGd9lPB/v+pjru11Ph1YdHNZ2dTYNxQghbhJC7BJCVAghjD0h10kFeAbw/uj3CfcL56HVDxlyaoxR9mTvaXB8vaioiNatW6ukXgshBK1bt27SuxlXWBp8oeiQaNLy0iguL9btmscKjrH52GauiLrC4b9bf7vob9omso3/dmi71dk6xr4T+Avwqw6xuK3WPq2Zc/kc/Dz9mLpyKml5aUaHZHfZRdkcP3u8UROnKqnXrak/G1dZGlydNcRKmSzTtTb7ikMrkEjGdh6r2zUbq41vG6b2ncra9LX8lv6bw9sHGxO7lHKPlNKYE2ldTIR/BB9d/hEAE5ZMYNRXo86rVOluKifDWtJSxxEjRrBp0yYArrzySnJzcw2JwxWWBldX+eKv5wTqsoPLiA6JpktgF92u2RS397ydqFZRvLzxZUrKSxzevloV40BRgVFMiJ5Afkk+xwuPV1WqdOblaM1lr1ICi7ZkMHTWz3T+x2KGzvqZRVsydL2+XpYsWUJQUJDRYbiEDgEd8PHw0W2YMv10OttPbmdclP4lBBrLYrbwxMAnOJR/iE93f+rw9htM7EKIVUKInbV8XNuUhoQQ9wohNgkhNp04caL5Ebu4hXsX1rjNmZejNVdydjLhfuEEeeuX3BZtyeDJb3eQkVuIBDJyC3ny2x02Jfe0tDSio6OZMmUKffr0YcKECaxatYqhQ4fSvXt3/vjjDwoKCpg8eTIDBgwgLi6O77/Xim4VFhZy6623Ehsbyy233EJhYWHVdaOiojh5UjuE4brrruOiiy6id+/ezJkzp+o+/v7+PP300/Tt25dBgwZx7FjLPLSlsja7Xom9cnmxPWrDNMWwyGGM6DCCD7d/6PADeRpcFSOlHK1HQ1LKOcAcgP79+xu/NcsgdS07c9blaM2Vkp3S5N76cz/uYndm3YWUthzOpaT8/AJrhaXlPL5wO//9o/ZjCnu1a8W/ru5db7v79u3j66+/Zs6cOQwYMIAvvviCdevW8cMPP/Diiy/Sq1cvRo4cyfz588nNzWXgwIGMHj2aDz/8EF9fX7Zv38727duJj699xdP8+fMJCQmhsLCQAQMGcMMNN9C6dWsKCgoYNGgQL7zwAo8//jgfffQR//znPxv4Kbkna4iVnw78RIWssLn8xLKDy4gNiyXSP1Kn6Jrv8QGPc92i63gj6Q1mXTLLYe2qoRgHq2vZWZhPmIMjsZ/CskIO5h/UfXz9wqTe0O2N1blzZ2JiYjCZTPTu3ZtRo0YhhCAmJoa0tDRWrFjBrFmz6NevHyNGjKCoqIjDhw/z66+/MnHiRABiY2OJjY2t9fpvvfVWVa/8yJEj7N2r1RPx9PTkqquuAuCiiy4iLS3NpufhyqwhVgpKC8g4Y9vQ2oHcA6TkpHBFlGPWrjekQ0AH7upzF4sPLCbpWJLD2rVpHbsQ4nrgbSAMWCyE2CqldPw0tAuZHj+dmQkzaxwcUlRWxKH8Q3Rq1cmgyPSzL2cfFbKiyTtOG+pZD531Mxm5hTVujwzy4cupzS9N6+XlVfW5yWSq+tpkMlFWVobZbOabb77Baq35QtXQqpW1a9eyatUqEhMT8fX1rXphALBYLFWPN5vNlJWVNfs5uLrKd3ep2al0CGj+2u9lacsQCMZEjdErNJvd3eduftj/A//47R+A9u483C+c6fHT7TaxbeuqmO+klO2llF5SyrYqqTestuVo0+Km4WH24K5ld7EvZ5/RIdosOUcbK9W7x/7YWCs+lvMr9PlYzDw21r4rb8aOHcvbb79dVdxpy5YtAAwfPpzPP9eOp9u5cyfbt9c8Ii0vL4/g4GB8fX1JTk7m999/t2usrqpbcDdMwlT1u9McUkqWHlxK//D+tPFto2N0tvG1+HJZh8vIKsg673hPey6aUEMxBrhwOdo9sfewYOwCBIJJyyex59Qeo0O0SUp2CgGWAN3HOK+Li+Slv8QQGeSDQOupv/SXGK6Ls+9Y6jPPPENpaSmxsbH06dOHZ555BoD777+fM2fOEBsby7///W8GDqx57Nq4ceMoKysjNjaWZ555hkGDBtk1Vlfl4+FDVKsomyZQU3JSSMtPM3Q1TF3WHllb4zZ7LpoQRpSY7N+/v6xc76v86XD+YaasmMKZkjO8f/n79A3ra3RIzTJhyQQ8TZ4sGLegwfvu2bOHnj1VkbD6tJSf0eO/Ps7W41tZceOKZj3+jc1v8J9d/+Hnm38m2DtY5+hsE/tJbK0F2gSC7Xc2/jBsIcRmKWWDu/xVj92JdGzVkY/HfUyQdxD3rriXTUdd78WvvKKcvTl7ValepcmiQ6LJKsgirzivyY+VUrLs4DIGtRvkdEkdHH+8p0rsTqadfzs+Hvcxbf3acv+q+0nITDA6pCY5fPowhWWFLWrHqaKPygnU5uxA3X5yO5kFmQ6r5NhUjq7hoxK7E2rj24YFYxfQsVVHHlr9UK3jc85K1WBXmqtHiFaFsTnj7MsOLsPT5MllHS7TOyxdOLqGjyrb66Ra+7Rm/tj5TF05lRlrZjBr+CzGRjn/oqPk7GQ8TB6G1ehQXFeoTyihPqFNPk2pvKKc5WnLuaT9JQR4BtgpOtuN7zLeYXV7VI/diQV6BfLRmI+ICYvh8V8f58f9PxodUoNSslPoFtQNi7lxB0MoSnXWEGuTh2KSjidxovCE4SUEnIlK7E4uwDOAD0Z/wIC2A3h63dMsTK1Za8ZZSCnZk70Ha7AaX1eaJzo4mv15+yktL230Y5YeXIqPhw/DI4fbMTLXohK7C/C1+PLOqHeqzlD9fM/nRodUq5OFJ8kuyqZna9ceX3/zzTc5e7b2Mys//vhjHnroIQdH1HJEh0RTVlHG/rzG1WYvrShl5aGVjOgwAl+Lr52jcx0qsbsIbw9vZl82m9EdRzPrj1nM2zHP6JBqqJz0smuPfftX8EYfmBmk/bv9K92bqC+xK/ZVuZqqsROoG7I2kFuc6zS1YZyFmjx1IRazhVcufYWn1z3Nm0lvUlRexAN9H3CaE4iqEru9ljpu/wp+nAal5+rF5B3RvgaIvblZlywoKODmm28mPT2d8vJybrrpJjIzM7nssssIDQ1lzZo1LFiwgJdeeomIiAh69OhxXm0ZRV8dAzri4+HT6HH2pQeXEmAJYGjkUDtH5lpUYncxHiYPXhz2Il5mLz7Y9gHFZcXMuGiGUyT35Oxk2vu3b/7KhKX/gKM76v5++ka48FzM0kL4/iHY/EntjwmPgSvqLpe6bNky2rVrx+LFWs2OvLw8FixYwJo1awgNDSUrK4t//etfbN68mcDAQC677DLi4uKa+syURjKbzHQP7t6oHntxeTE/H/6Z0Z1G42n2dEB0rkMNxbggs8nMzCEzudV6Kwt2LeDFDS9SIW0rXauHlJwU++44reuwYxsOQY6JiWHVqlU88cQT/PbbbwQGBp73/Q0bNjBixAjCwsLw9PTklltuaXZbSuNYg7WVMQ2VO1mXsY4zpWfUMEwtVI/dRZmEiacufgpvD28+3vUxJRUlPDvoWcwmc8MPtoOC0gIO5R/iqi5XNf8i9fSsAW1MPe9IzdsDO8Ck5lXJ69GjB5s3b2bJkiU8+eSTjBlTs9yrM7wbakmiQ6L5OvVrsgqyaOffrs77LTu4jGCvYAZG1Cy+1tKpHrsLE0Lwt4v+xn197+Pbvd/y1LqnKK1o/DIxPaXmpAJ23nE66lmw+Jx/m8VHu72ZMjMz8fX1ZeLEiTz66KMkJSUREBDA6dOnAbj44otZu3Ytp06dorS0lK+//tqWZ6A0QmMmUM+WnuWX9F8YEzUGD5Pqn15I/URcnBCCB/s9iJfZi9lJsykuL+aV4a84fIOQ3SdO4c8J0tX/B3npENheS+rNnDgF2LFjB4899hgmkwmLxcL7779PYmIiV1xxBREREaxZs4aZM2cyePBgIiIiiI+Pp7y8XKcnpNSme1B3BIKU7BRGdhxZ631+Sf+FwrJCpyzR6wxUYncTU2Km4OPhw6w/ZjF9zXReH/E63h7eDT9QJynZKQR5BdHWt619G4q92aZEfqGxY8cyduz5pRr69+/Pww8/XPX1pEmTmDRpkm5tKvXztfjSqVWnenvsSw4uoY1PG+Lb1n7ObEunErsbmdBzAp5mT55PfJ5bf7yVgrICjp09ZvdjuAD2ZO8hOiRajUcruogOiWbHydpXSOUV57EuYx23Rd9m88HX7kr9VNzMTT1u4sYeN7I/fz9Hzx51yDFcpRWl7MvZp2qwK7qxhljJOJNBfkl+je/9fPhnyirK1GqYeqjE7obWZayrcVtReRGvbnqVorKiWh5hm7S8NEoqSlQNdkU3lbuXa9uotCxtGZH+kfQJ7ePosFyGGopxQ0cLjtZ6+8nCkwz6YhBRraLoEdIDa7AVa4iV6JBoQn1Cm92eqsGu6K3y3V9qTioDwgdU3X6q8BQbsjYwqc8kNexXD5XY3VC4XzhZBYNqJlsAAAlxSURBVFk1bg/2CuZm682kZKew9fhWlh5cWvW9EO+QqkTfI7gH1hArnQM7YzE1vLomOTsZL7MXnVp10vV5KC1XqE8oId4hNSZQVx1aRbksV6thGqASuxuaHj+dmQkzKSr/c9jF2+zNEwOfOG8CNa84j9ScVFJzUknOTiYlO4Uv9nxBSUUJABaThW5B3aoSfWXiD/T6c3fm4gOL+V/y/yipKOHKb6+0+ySt0jIIIYgOia4xFLM0bSldArvQI7iHQZG5BpXY3VBlYp2dNJujBUfrXBUT6BXIgPAB573VLasoIy0vjZScFFKyU0jJSWFdxjq+3/991X3C/cKxBlsxYWJd5rqqTVGVk7TVY1CU5rKGWPls92eUlpdiMVs4WnCUpGNJ3N/vfjUM0wCV2N1Uc4/h8jB50C24G92Cu533+JOFJ0nNTiU5R+vZp+aksi93X43HF5UXMTtptt0S++IDixt8wVLcQ3RwNKUVpRzIO4A1xMqKtBVIpBqGaQS1KkZplFCfUIZEDmFyn8m8PPxlvrv2OwS195rqmry11eIDi5mZMJOsgizdlnGmpaURHR3NlClT6NOnDxMmTGDVqlUMHTqU7t2788cff1BQUMDkyZMZMGAAcXFxfP/991WPveSSS4iPjyc+Pp6EhAQA1q5dy4gRI7jxxhuJjo5mwoQJDRa0UmqqnECtPAN1Wdoyeob0pHNgZyPDcgmqx640W12TtOF+4c263st/vFzvbsPtJ7ZXjf9XKiov4tn1z9Z5ZGB0SDRPDHyi3nb37dvH119/zZw5cxgwYABffPEF69at44cffuDFF1+kV69ejBw5kvnz55Obm8vAgQMZPXo0bdq0YeXKlXh7e7N3715uu+02Nm3aBMCWLVvYtWsX7dq1Y+jQoaxfv55hw4Y18SfSsnVq1QlvszfJ2cnEtYljx8kdzLhohtFhuQTVY1eabXr8dLzN55ct8DZ7Mz1+ul3auzCpN3R7Y3Xu3JmYmBhMJhO9e/dm1KhRCCGIiYkhLS2NFStWMGvWLPr168eIESMoKiri8OHDlJaWcs899xATE8NNN93E7t27q645cOBA2rdvj8lkol+/fqSlpdkUY0tkNpnpFtSNlOwUlqctB2Bs1NgGHqWA6rErNmjsJG1jNdSzHrNwTK3vECL8IlgwbkGz2gTOOxHJZDJVfW0ymSgrK8NsNvPNN99gtZ6/AWvmzJm0bduWbdu2UVFRgbe3d63XNJvNlJWVNTu+lswaYmXloZXkFOfQN6wvkf6RRofkElSPXbHJ+C7jWXHjCrbfuZ0VN66w60Smo98hVBo7dixvv/121Tj5li1bAO20pYiICEwmE59++qmq+mgHZRVl5JfkszdnLwfzDtqtLIa7UYldcRnju4xn5pCZRPhFIBBE+EUwc8hMu6+KeeaZZygtLSU2NpY+ffrwzDPPAPDAAw/wySefMGjQIFJTU/Hz87NrHC3N4gOLz9tEl1+Sb9eaR+5EGDFb379/f1k5yaS0bHv27KFnT1WKoD4t9WdU39DbihtXGBCR8YQQm6WU/Ru6n009diHEK0KIZCHEdiHEd0KIIFuupyiKUqmuZbP2Wk7rTmwdilkJ9JFSxgKpwJO2h6QoilL3stnmLqdtSWxK7FLKFVLKyun+34H2toekKIpi3GS5O9BzueNk4Esdr6e0EFJKVfujDi15x6rey2lbkgYTuxBiFVDbe5+npZTfn7vP00AZ8Hk917kXuBegY8eOzQpWcT/e3t6cOnWK1q1bq+R+ASklp06dOm99fEvT3JpHLZ3Nq2KEEHcC9wGjpJRnG/MYtSpGqVRaWkp6ejpFRfqf7OQOvL29ad++PRZLw3XxFffX2FUxNg3FCCHGAU8AlzY2qStKdRaLhc6dVVEnRdGTrati3gECgJVCiK1CiA90iElRFEWxgU09dillN70CURRFUfShSgooiqK4GUNKCgghTgCHmvnwUOCkjuEYST0X5+MuzwPUc3FWtjyXTlLKsIbuZEhit4UQYlNjZoVdgXouzsddngeo5+KsHPFc1FCMoiiKm1GJXVEUxc24YmKfY3QAOlLPxfm4y/MA9Vycld2fi8uNsSuKoij1c8Ueu6IoilIPl0rsQohxQogUIcQ+IcQ/jI6nOYQQHYQQa4QQe4QQu4QQLl+DVAhhFkJsEUL8ZHQsthBCBAkhFp47PGaPEGKw0TE1lxBixrnfr51CiP8KIVymkpgQYr4Q4rgQYme120KEECuFEHvP/RtsZIyNUcfzcMjhRC6T2IUQZuBd4AqgF3CbEKKXsVE1SxnwdyllT2AQ8KCLPo/qpgN7jA5CB7OBZVLKaKAvLvqchBCRwDSgv5SyD2AGbjU2qib5GBh3wW3/AFZLKbsDq8997ew+pubzcMjhRC6T2IGBwD4p5QEpZQnwP+Bag2NqMilllpQy6dznp9GSR6SxUTWfEKI9MB6Ya3QsthBCtAKGA/MApJQlUspcY6OyiQfgI4TwAHyBTIPjaTQp5a9A9gU3Xwt8cu7zT4DrHBpUM9T2PBx1OJErJfZI4Ei1r9Nx4YQIIISIAuKADcZGYpM3gceBCqMDsVEX4ASw4Nyw0lwhhJ/RQTWHlDIDeBU4DGQBeVJKVz/9ua2UMgu0zhHQxuB49DAZWGqPC7tSYq/tFAaXXdIjhPAHvgEekVLmGx1PcwghrgKOSyk3Gx2LDjyAeOB9KWUcUIBrvN2v4dz487VAZ6Ad4CeEmGhsVEp1jTmcyBaulNjTgQ7Vvm6PC729rE4IYUFL6p9LKb81Oh4bDAWuEUKkoQ2NjRRCfGZsSM2WDqRLKSvfPS1ES/SuaDRwUEp5QkpZCnwLDDE4JlsdE0JEAJz797jB8TTbucOJrgImSDutN3elxL4R6C6E6CyE8ESbDPrB4JiaTGjnv80D9kgpXzc6HltIKZ+UUraXUkah/X/8LKV0yZ6hlPIocEQIYT130yhgt4Eh2eIwMEgI4Xvu920ULjoRXM0PwJ3nPr8T+N7AWJqt2uFE19jzcCKXSeznJhweApaj/ZJ+JaXcZWxUzTIUuB2td7v13MeVRgelAPAw8LkQYjvQD3jR4Hia5dy7joVAErAD7e/cZXZuCiH+CyQCViFEuhDibmAWcLkQYi9w+bmvnVodz8MhhxOpnaeKoihuxmV67IqiKErjqMSuKIriZlRiVxRFcTMqsSuKorgZldgVRVHcjErsiqIobkYldkVRFDejEruiKIqb+f/hwxjc+Rti0QAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"summary = pd.DataFrame({'median': np.median(show, axis=0),\n", | |
" 'std': np.std(show, axis=0),\n", | |
" 'mean': np.mean(show, axis=0)})\n", | |
"summary.plot(style='o-')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"from sklearn.model_selection import train_test_split" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"X_train, X_test, y_train, y_test = train_test_split(X, y)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.linear_model import LogisticRegression\n", | |
"from sklearn.metrics import log_loss, make_scorer\n", | |
"\n", | |
"class LogisticProbs(LogisticRegression):\n", | |
" def score(self, X, y):\n", | |
" prob_hat = self.predict_proba(X)\n", | |
" score = log_loss(y, prob_hat)\n", | |
" return score\n", | |
"\n", | |
"est = LogisticProbs(penalty='l2', # l1\n", | |
" C=1e2, # loguniform(5, -1)\n", | |
" class_weight='balanced', # or [1/0.97, 1/0.03]\n", | |
" solver='saga',\n", | |
" tol=1e-5, # loguniform(-3, -5)\n", | |
" n_jobs=-1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/Users/ssievert/anaconda3/envs/dask-master/lib/python3.6/site-packages/sklearn/linear_model/sag.py:326: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n", | |
" \"the coef_ did not converge\", ConvergenceWarning)\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 3min 8s, sys: 2.25 s, total: 3min 10s\n", | |
"Wall time: 1min 50s\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"LogisticProbs(C=100.0, class_weight='balanced', dual=False,\n", | |
" fit_intercept=True, intercept_scaling=1, max_iter=100,\n", | |
" multi_class='ovr', n_jobs=-1, penalty='l2', random_state=None,\n", | |
" solver='saga', tol=1e-05, verbose=0, warm_start=False)" | |
] | |
}, | |
"execution_count": 42, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"est.fit(X_train, y_train)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.7100791417571256" | |
] | |
}, | |
"execution_count": 43, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"est.score(X_test, y_test)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The best scores on the 2014 [Criteo Kaggle] competition (using the same loss) is 0.44463 (lower is better).\n", | |
"\n", | |
"The result I have here is very variable: the score is pretty variable when `LogisticProbs` is rerun (i.e., when it uses a different random seed for SAGA). The results I've gotten are `[0.442, 0.9139, 0.7101]` with the same parameters but different train/test splits and `LogisticProbs` random states.\n", | |
"\n", | |
"[Criteo Kaggle]:https://www.kaggle.com/c/criteo-display-ad-challenge" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
It looks like (dask/dask-ml#295 (comment)) I ran this notebook locally because it only uses a (very small) subset of the dataset. I don't recall ever using the complete Criteo dataset, or even a significant fraction of it
But you use dask.distrebuted
Also
Data is only 370gb in zipped file
In this link
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#criteo_tb
I mostly used Distributed for it's useful dashboard (for debugging/profiling/etc). I didn't focus on actually scaling to entire Criteo dataset; IIRC this simple use case illustrated some problems in Dask-ML.
My metric for "big data" is any data that's too large to fit in RAM. 370GB is certainly more RAM than the 16GB my local machine has.
I see
Is dask distributed is free
And will read data from lobsvm format?
Is dask distributed is free
Yes. Free as in beer (i.e, doesn't cost money) and free as in speech (the source is freely available).
And will read data from lobsvm format?
Yes. Dask-ML is a wrapper around scikit-learn, and they have a function for read in libsvm: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html. It'd be pretty simple to wrap that function with Dask:
from sklearn.datasets import load_svmlight_file
def read_chunk(filename):
X, y = load_svmlight_file(filename)
return X, y # scipy.sparse matrix, raw ndarray
from distributed import Client
client = Client()
filenames = ["criteo-day-1.svmlight", ...]
Xs_ys = client.map(read_chunk, filenames)
# Xs_ys will be tasked to the cluster, and will perform work in the background
# continue with rest of notebook
This code is untested.
Great code thanks
So dusk can help in both cases to read original RTB Criteo file or libsvm format
Only short question:
In your code above - load_svmlight_file meanse to read any svmlib format or specific svmlight format
Again thank a lot taking care
if somebody tried to run locally ?
like
https://github.com/rambler-digital-solutions/criteo-1tb-benchmark