jorisvandenbossche · August 31, 2020 14:16
diff --git a/pandas-consolidation-POC-array-manager.ipynb b/pandas-consolidation-POC-array-manager.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# POC 1D ArrayManager vs consolidated BlockManager"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See https://github.com/pandas-dev/pandas/issues/10556 and mailing list discussion at https://mail.python.org/pipermail/pandas-dev/2020-May/001219.html for context. \n",
    "\n",
    "This notebook is using a branch of pandas that adds an experimental  \"array manager\".\n",
    "\n",
    "**Note** This notebook includes some benchmarks. However, the results of the timings are highly dependent on the characteristics of the data that is being used. I am now using a DataFrame with 100 columns and a 1000:1 rows/column ratio, which I think is a quite typical scenario. But when tweaking this (more or less columns, different ratio), this easily changes the results / interpretation of the benchmarks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.compute.use_numexpr = False\n",
    "pd.options.compute.use_bottleneck = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext snakeviz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext line_profiler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "arr = np.random.randn(100_000, 100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1_orig = pd.DataFrame(arr, manager=\"block\")  # the current default\n",
    "# additional copy to ensure the memory is column-aligned\n",
    "df1 = df1_orig.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>90</th>\n",
       "      <th>91</th>\n",
       "      <th>92</th>\n",
       "      <th>93</th>\n",
       "      <th>94</th>\n",
       "      <th>95</th>\n",
       "      <th>96</th>\n",
       "      <th>97</th>\n",
       "      <th>98</th>\n",
       "      <th>99</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.677230</td>\n",
       "      <td>0.682754</td>\n",
       "      <td>-0.315458</td>\n",
       "      <td>1.432242</td>\n",
       "      <td>2.391720</td>\n",
       "      <td>0.644951</td>\n",
       "      <td>1.791661</td>\n",
       "      <td>0.491711</td>\n",
       "      <td>-0.358679</td>\n",
       "      <td>-1.394864</td>\n",
       "      <td>...</td>\n",
       "      <td>0.407030</td>\n",
       "      <td>-0.713274</td>\n",
       "      <td>1.718921</td>\n",
       "      <td>0.345061</td>\n",
       "      <td>-2.146100</td>\n",
       "      <td>-0.738093</td>\n",
       "      <td>-0.159594</td>\n",
       "      <td>0.520909</td>\n",
       "      <td>1.133554</td>\n",
       "      <td>0.585371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.559448</td>\n",
       "      <td>-1.280697</td>\n",
       "      <td>0.402636</td>\n",
       "      <td>-0.583523</td>\n",
       "      <td>-0.266790</td>\n",
       "      <td>0.418998</td>\n",
       "      <td>0.994489</td>\n",
       "      <td>-0.976886</td>\n",
       "      <td>-1.696476</td>\n",
       "      <td>0.926231</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.855396</td>\n",
       "      <td>-2.048210</td>\n",
       "      <td>-1.265783</td>\n",
       "      <td>2.394670</td>\n",
       "      <td>-2.258890</td>\n",
       "      <td>1.229371</td>\n",
       "      <td>1.750581</td>\n",
       "      <td>0.139859</td>\n",
       "      <td>0.636832</td>\n",
       "      <td>0.385954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.104786</td>\n",
       "      <td>-0.235949</td>\n",
       "      <td>0.001328</td>\n",
       "      <td>0.771150</td>\n",
       "      <td>-0.003778</td>\n",
       "      <td>-1.039505</td>\n",
       "      <td>0.879596</td>\n",
       "      <td>-0.625585</td>\n",
       "      <td>-0.979214</td>\n",
       "      <td>0.022012</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.994192</td>\n",
       "      <td>0.661030</td>\n",
       "      <td>-0.010726</td>\n",
       "      <td>-0.468801</td>\n",
       "      <td>-0.605885</td>\n",
       "      <td>1.311313</td>\n",
       "      <td>0.339141</td>\n",
       "      <td>-0.211478</td>\n",
       "      <td>-0.717916</td>\n",
       "      <td>1.467792</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.062078</td>\n",
       "      <td>0.095454</td>\n",
       "      <td>-0.557974</td>\n",
       "      <td>0.701680</td>\n",
       "      <td>0.193023</td>\n",
       "      <td>-1.002278</td>\n",
       "      <td>-0.223521</td>\n",
       "      <td>-0.926172</td>\n",
       "      <td>-0.494398</td>\n",
       "      <td>-0.363959</td>\n",
       "      <td>...</td>\n",
       "      <td>0.020915</td>\n",
       "      <td>-0.800234</td>\n",
       "      <td>-0.047158</td>\n",
       "      <td>-0.410241</td>\n",
       "      <td>-1.296799</td>\n",
       "      <td>-1.308874</td>\n",
       "      <td>0.467894</td>\n",
       "      <td>2.335768</td>\n",
       "      <td>0.298389</td>\n",
       "      <td>-0.855592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.504247</td>\n",
       "      <td>0.312447</td>\n",
       "      <td>-0.698603</td>\n",
       "      <td>-0.616552</td>\n",
       "      <td>1.530180</td>\n",
       "      <td>-0.647969</td>\n",
       "      <td>1.494239</td>\n",
       "      <td>-0.680018</td>\n",
       "      <td>-0.205344</td>\n",
       "      <td>0.757280</td>\n",
       "      <td>...</td>\n",
       "      <td>0.468555</td>\n",
       "      <td>1.204949</td>\n",
       "      <td>1.374184</td>\n",
       "      <td>0.153511</td>\n",
       "      <td>-0.450651</td>\n",
       "      <td>0.877156</td>\n",
       "      <td>0.892300</td>\n",
       "      <td>1.151173</td>\n",
       "      <td>-0.230626</td>\n",
       "      <td>0.605697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99995</th>\n",
       "      <td>0.539733</td>\n",
       "      <td>2.411698</td>\n",
       "      <td>-0.455420</td>\n",
       "      <td>0.538498</td>\n",
       "      <td>-0.956580</td>\n",
       "      <td>-0.384449</td>\n",
       "      <td>-1.338329</td>\n",
       "      <td>1.211452</td>\n",
       "      <td>-0.462413</td>\n",
       "      <td>-1.330185</td>\n",
       "      <td>...</td>\n",
       "      <td>1.651992</td>\n",
       "      <td>0.476358</td>\n",
       "      <td>0.441226</td>\n",
       "      <td>0.704206</td>\n",
       "      <td>-0.270060</td>\n",
       "      <td>1.247129</td>\n",
       "      <td>1.247639</td>\n",
       "      <td>-0.143032</td>\n",
       "      <td>0.044815</td>\n",
       "      <td>-0.631406</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99996</th>\n",
       "      <td>0.104250</td>\n",
       "      <td>1.415562</td>\n",
       "      <td>0.139757</td>\n",
       "      <td>-0.621568</td>\n",
       "      <td>-0.249787</td>\n",
       "      <td>0.358591</td>\n",
       "      <td>0.238794</td>\n",
       "      <td>1.232123</td>\n",
       "      <td>0.139006</td>\n",
       "      <td>0.684821</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.207111</td>\n",
       "      <td>0.549576</td>\n",
       "      <td>2.155070</td>\n",
       "      <td>0.780223</td>\n",
       "      <td>0.281913</td>\n",
       "      <td>0.053108</td>\n",
       "      <td>0.433405</td>\n",
       "      <td>1.200421</td>\n",
       "      <td>1.526589</td>\n",
       "      <td>-0.269244</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99997</th>\n",
       "      <td>1.014513</td>\n",
       "      <td>0.349818</td>\n",
       "      <td>-0.482681</td>\n",
       "      <td>0.130529</td>\n",
       "      <td>-0.084422</td>\n",
       "      <td>1.623366</td>\n",
       "      <td>-1.069759</td>\n",
       "      <td>0.403931</td>\n",
       "      <td>-0.946770</td>\n",
       "      <td>-0.127743</td>\n",
       "      <td>...</td>\n",
       "      <td>0.245094</td>\n",
       "      <td>-0.188509</td>\n",
       "      <td>-0.425818</td>\n",
       "      <td>0.425148</td>\n",
       "      <td>-1.733082</td>\n",
       "      <td>0.366996</td>\n",
       "      <td>0.278778</td>\n",
       "      <td>-0.457803</td>\n",
       "      <td>-1.976963</td>\n",
       "      <td>1.287677</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99998</th>\n",
       "      <td>1.280127</td>\n",
       "      <td>2.488734</td>\n",
       "      <td>-1.073581</td>\n",
       "      <td>-0.636677</td>\n",
       "      <td>-0.909728</td>\n",
       "      <td>-1.016732</td>\n",
       "      <td>1.018966</td>\n",
       "      <td>-1.430341</td>\n",
       "      <td>-0.922263</td>\n",
       "      <td>1.865314</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.027662</td>\n",
       "      <td>0.511552</td>\n",
       "      <td>-0.559474</td>\n",
       "      <td>-2.757027</td>\n",
       "      <td>-1.092821</td>\n",
       "      <td>0.437376</td>\n",
       "      <td>-1.051840</td>\n",
       "      <td>1.263932</td>\n",
       "      <td>0.042789</td>\n",
       "      <td>0.152446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99999</th>\n",
       "      <td>0.782564</td>\n",
       "      <td>-0.388404</td>\n",
       "      <td>-0.247960</td>\n",
       "      <td>1.381064</td>\n",
       "      <td>-1.605046</td>\n",
       "      <td>0.974790</td>\n",
       "      <td>-1.178229</td>\n",
       "      <td>-0.396106</td>\n",
       "      <td>-0.368758</td>\n",
       "      <td>-1.187883</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.746631</td>\n",
       "      <td>0.410091</td>\n",
       "      <td>-0.249438</td>\n",
       "      <td>0.042342</td>\n",
       "      <td>-0.597671</td>\n",
       "      <td>0.467246</td>\n",
       "      <td>1.492825</td>\n",
       "      <td>2.039514</td>\n",
       "      <td>-1.024533</td>\n",
       "      <td>2.571223</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100000 rows × 100 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             0         1         2         3         4         5         6   \\\n",
       "0     -0.677230  0.682754 -0.315458  1.432242  2.391720  0.644951  1.791661   \n",
       "1     -0.559448 -1.280697  0.402636 -0.583523 -0.266790  0.418998  0.994489   \n",
       "2     -0.104786 -0.235949  0.001328  0.771150 -0.003778 -1.039505  0.879596   \n",
       "3     -0.062078  0.095454 -0.557974  0.701680  0.193023 -1.002278 -0.223521   \n",
       "4      1.504247  0.312447 -0.698603 -0.616552  1.530180 -0.647969  1.494239   \n",
       "...         ...       ...       ...       ...       ...       ...       ...   \n",
       "99995  0.539733  2.411698 -0.455420  0.538498 -0.956580 -0.384449 -1.338329   \n",
       "99996  0.104250  1.415562  0.139757 -0.621568 -0.249787  0.358591  0.238794   \n",
       "99997  1.014513  0.349818 -0.482681  0.130529 -0.084422  1.623366 -1.069759   \n",
       "99998  1.280127  2.488734 -1.073581 -0.636677 -0.909728 -1.016732  1.018966   \n",
       "99999  0.782564 -0.388404 -0.247960  1.381064 -1.605046  0.974790 -1.178229   \n",
       "\n",
       "             7         8         9   ...        90        91        92  \\\n",
       "0      0.491711 -0.358679 -1.394864  ...  0.407030 -0.713274  1.718921   \n",
       "1     -0.976886 -1.696476  0.926231  ... -0.855396 -2.048210 -1.265783   \n",
       "2     -0.625585 -0.979214  0.022012  ... -0.994192  0.661030 -0.010726   \n",
       "3     -0.926172 -0.494398 -0.363959  ...  0.020915 -0.800234 -0.047158   \n",
       "4     -0.680018 -0.205344  0.757280  ...  0.468555  1.204949  1.374184   \n",
       "...         ...       ...       ...  ...       ...       ...       ...   \n",
       "99995  1.211452 -0.462413 -1.330185  ...  1.651992  0.476358  0.441226   \n",
       "99996  1.232123  0.139006  0.684821  ... -0.207111  0.549576  2.155070   \n",
       "99997  0.403931 -0.946770 -0.127743  ...  0.245094 -0.188509 -0.425818   \n",
       "99998 -1.430341 -0.922263  1.865314  ... -0.027662  0.511552 -0.559474   \n",
       "99999 -0.396106 -0.368758 -1.187883  ... -1.746631  0.410091 -0.249438   \n",
       "\n",
       "             93        94        95        96        97        98        99  \n",
       "0      0.345061 -2.146100 -0.738093 -0.159594  0.520909  1.133554  0.585371  \n",
       "1      2.394670 -2.258890  1.229371  1.750581  0.139859  0.636832  0.385954  \n",
       "2     -0.468801 -0.605885  1.311313  0.339141 -0.211478 -0.717916  1.467792  \n",
       "3     -0.410241 -1.296799 -1.308874  0.467894  2.335768  0.298389 -0.855592  \n",
       "4      0.153511 -0.450651  0.877156  0.892300  1.151173 -0.230626  0.605697  \n",
       "...         ...       ...       ...       ...       ...       ...       ...  \n",
       "99995  0.704206 -0.270060  1.247129  1.247639 -0.143032  0.044815 -0.631406  \n",
       "99996  0.780223  0.281913  0.053108  0.433405  1.200421  1.526589 -0.269244  \n",
       "99997  0.425148 -1.733082  0.366996  0.278778 -0.457803 -1.976963  1.287677  \n",
       "99998 -2.757027 -1.092821  0.437376 -1.051840  1.263932  0.042789  0.152446  \n",
       "99999  0.042342 -0.597671  0.467246  1.492825  2.039514 -1.024533  2.571223  \n",
       "\n",
       "[100000 rows x 100 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df1._data.blocks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = pd.DataFrame(arr, manager=\"array\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>90</th>\n",
       "      <th>91</th>\n",
       "      <th>92</th>\n",
       "      <th>93</th>\n",
       "      <th>94</th>\n",
       "      <th>95</th>\n",
       "      <th>96</th>\n",
       "      <th>97</th>\n",
       "      <th>98</th>\n",
       "      <th>99</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.677230</td>\n",
       "      <td>0.682754</td>\n",
       "      <td>-0.315458</td>\n",
       "      <td>1.432242</td>\n",
       "      <td>2.391720</td>\n",
       "      <td>0.644951</td>\n",
       "      <td>1.791661</td>\n",
       "      <td>0.491711</td>\n",
       "      <td>-0.358679</td>\n",
       "      <td>-1.394864</td>\n",
       "      <td>...</td>\n",
       "      <td>0.407030</td>\n",
       "      <td>-0.713274</td>\n",
       "      <td>1.718921</td>\n",
       "      <td>0.345061</td>\n",
       "      <td>-2.146100</td>\n",
       "      <td>-0.738093</td>\n",
       "      <td>-0.159594</td>\n",
       "      <td>0.520909</td>\n",
       "      <td>1.133554</td>\n",
       "      <td>0.585371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.559448</td>\n",
       "      <td>-1.280697</td>\n",
       "      <td>0.402636</td>\n",
       "      <td>-0.583523</td>\n",
       "      <td>-0.266790</td>\n",
       "      <td>0.418998</td>\n",
       "      <td>0.994489</td>\n",
       "      <td>-0.976886</td>\n",
       "      <td>-1.696476</td>\n",
       "      <td>0.926231</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.855396</td>\n",
       "      <td>-2.048210</td>\n",
       "      <td>-1.265783</td>\n",
       "      <td>2.394670</td>\n",
       "      <td>-2.258890</td>\n",
       "      <td>1.229371</td>\n",
       "      <td>1.750581</td>\n",
       "      <td>0.139859</td>\n",
       "      <td>0.636832</td>\n",
       "      <td>0.385954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.104786</td>\n",
       "      <td>-0.235949</td>\n",
       "      <td>0.001328</td>\n",
       "      <td>0.771150</td>\n",
       "      <td>-0.003778</td>\n",
       "      <td>-1.039505</td>\n",
       "      <td>0.879596</td>\n",
       "      <td>-0.625585</td>\n",
       "      <td>-0.979214</td>\n",
       "      <td>0.022012</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.994192</td>\n",
       "      <td>0.661030</td>\n",
       "      <td>-0.010726</td>\n",
       "      <td>-0.468801</td>\n",
       "      <td>-0.605885</td>\n",
       "      <td>1.311313</td>\n",
       "      <td>0.339141</td>\n",
       "      <td>-0.211478</td>\n",
       "      <td>-0.717916</td>\n",
       "      <td>1.467792</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.062078</td>\n",
       "      <td>0.095454</td>\n",
       "      <td>-0.557974</td>\n",
       "      <td>0.701680</td>\n",
       "      <td>0.193023</td>\n",
       "      <td>-1.002278</td>\n",
       "      <td>-0.223521</td>\n",
       "      <td>-0.926172</td>\n",
       "      <td>-0.494398</td>\n",
       "      <td>-0.363959</td>\n",
       "      <td>...</td>\n",
       "      <td>0.020915</td>\n",
       "      <td>-0.800234</td>\n",
       "      <td>-0.047158</td>\n",
       "      <td>-0.410241</td>\n",
       "      <td>-1.296799</td>\n",
       "      <td>-1.308874</td>\n",
       "      <td>0.467894</td>\n",
       "      <td>2.335768</td>\n",
       "      <td>0.298389</td>\n",
       "      <td>-0.855592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.504247</td>\n",
       "      <td>0.312447</td>\n",
       "      <td>-0.698603</td>\n",
       "      <td>-0.616552</td>\n",
       "      <td>1.530180</td>\n",
       "      <td>-0.647969</td>\n",
       "      <td>1.494239</td>\n",
       "      <td>-0.680018</td>\n",
       "      <td>-0.205344</td>\n",
       "      <td>0.757280</td>\n",
       "      <td>...</td>\n",
       "      <td>0.468555</td>\n",
       "      <td>1.204949</td>\n",
       "      <td>1.374184</td>\n",
       "      <td>0.153511</td>\n",
       "      <td>-0.450651</td>\n",
       "      <td>0.877156</td>\n",
       "      <td>0.892300</td>\n",
       "      <td>1.151173</td>\n",
       "      <td>-0.230626</td>\n",
       "      <td>0.605697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99995</th>\n",
       "      <td>0.539733</td>\n",
       "      <td>2.411698</td>\n",
       "      <td>-0.455420</td>\n",
       "      <td>0.538498</td>\n",
       "      <td>-0.956580</td>\n",
       "      <td>-0.384449</td>\n",
       "      <td>-1.338329</td>\n",
       "      <td>1.211452</td>\n",
       "      <td>-0.462413</td>\n",
       "      <td>-1.330185</td>\n",
       "      <td>...</td>\n",
       "      <td>1.651992</td>\n",
       "      <td>0.476358</td>\n",
       "      <td>0.441226</td>\n",
       "      <td>0.704206</td>\n",
       "      <td>-0.270060</td>\n",
       "      <td>1.247129</td>\n",
       "      <td>1.247639</td>\n",
       "      <td>-0.143032</td>\n",
       "      <td>0.044815</td>\n",
       "      <td>-0.631406</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99996</th>\n",
       "      <td>0.104250</td>\n",
       "      <td>1.415562</td>\n",
       "      <td>0.139757</td>\n",
       "      <td>-0.621568</td>\n",
       "      <td>-0.249787</td>\n",
       "      <td>0.358591</td>\n",
       "      <td>0.238794</td>\n",
       "      <td>1.232123</td>\n",
       "      <td>0.139006</td>\n",
       "      <td>0.684821</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.207111</td>\n",
       "      <td>0.549576</td>\n",
       "      <td>2.155070</td>\n",
       "      <td>0.780223</td>\n",
       "      <td>0.281913</td>\n",
       "      <td>0.053108</td>\n",
       "      <td>0.433405</td>\n",
       "      <td>1.200421</td>\n",
       "      <td>1.526589</td>\n",
       "      <td>-0.269244</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99997</th>\n",
       "      <td>1.014513</td>\n",
       "      <td>0.349818</td>\n",
       "      <td>-0.482681</td>\n",
       "      <td>0.130529</td>\n",
       "      <td>-0.084422</td>\n",
       "      <td>1.623366</td>\n",
       "      <td>-1.069759</td>\n",
       "      <td>0.403931</td>\n",
       "      <td>-0.946770</td>\n",
       "      <td>-0.127743</td>\n",
       "      <td>...</td>\n",
       "      <td>0.245094</td>\n",
       "      <td>-0.188509</td>\n",
       "      <td>-0.425818</td>\n",
       "      <td>0.425148</td>\n",
       "      <td>-1.733082</td>\n",
       "      <td>0.366996</td>\n",
       "      <td>0.278778</td>\n",
       "      <td>-0.457803</td>\n",
       "      <td>-1.976963</td>\n",
       "      <td>1.287677</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99998</th>\n",
       "      <td>1.280127</td>\n",
       "      <td>2.488734</td>\n",
       "      <td>-1.073581</td>\n",
       "      <td>-0.636677</td>\n",
       "      <td>-0.909728</td>\n",
       "      <td>-1.016732</td>\n",
       "      <td>1.018966</td>\n",
       "      <td>-1.430341</td>\n",
       "      <td>-0.922263</td>\n",
       "      <td>1.865314</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.027662</td>\n",
       "      <td>0.511552</td>\n",
       "      <td>-0.559474</td>\n",
       "      <td>-2.757027</td>\n",
       "      <td>-1.092821</td>\n",
       "      <td>0.437376</td>\n",
       "      <td>-1.051840</td>\n",
       "      <td>1.263932</td>\n",
       "      <td>0.042789</td>\n",
       "      <td>0.152446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99999</th>\n",
       "      <td>0.782564</td>\n",
       "      <td>-0.388404</td>\n",
       "      <td>-0.247960</td>\n",
       "      <td>1.381064</td>\n",
       "      <td>-1.605046</td>\n",
       "      <td>0.974790</td>\n",
       "      <td>-1.178229</td>\n",
       "      <td>-0.396106</td>\n",
       "      <td>-0.368758</td>\n",
       "      <td>-1.187883</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.746631</td>\n",
       "      <td>0.410091</td>\n",
       "      <td>-0.249438</td>\n",
       "      <td>0.042342</td>\n",
       "      <td>-0.597671</td>\n",
       "      <td>0.467246</td>\n",
       "      <td>1.492825</td>\n",
       "      <td>2.039514</td>\n",
       "      <td>-1.024533</td>\n",
       "      <td>2.571223</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100000 rows × 100 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             0         1         2         3         4         5         6   \\\n",
       "0     -0.677230  0.682754 -0.315458  1.432242  2.391720  0.644951  1.791661   \n",
       "1     -0.559448 -1.280697  0.402636 -0.583523 -0.266790  0.418998  0.994489   \n",
       "2     -0.104786 -0.235949  0.001328  0.771150 -0.003778 -1.039505  0.879596   \n",
       "3     -0.062078  0.095454 -0.557974  0.701680  0.193023 -1.002278 -0.223521   \n",
       "4      1.504247  0.312447 -0.698603 -0.616552  1.530180 -0.647969  1.494239   \n",
       "...         ...       ...       ...       ...       ...       ...       ...   \n",
       "99995  0.539733  2.411698 -0.455420  0.538498 -0.956580 -0.384449 -1.338329   \n",
       "99996  0.104250  1.415562  0.139757 -0.621568 -0.249787  0.358591  0.238794   \n",
       "99997  1.014513  0.349818 -0.482681  0.130529 -0.084422  1.623366 -1.069759   \n",
       "99998  1.280127  2.488734 -1.073581 -0.636677 -0.909728 -1.016732  1.018966   \n",
       "99999  0.782564 -0.388404 -0.247960  1.381064 -1.605046  0.974790 -1.178229   \n",
       "\n",
       "             7         8         9   ...        90        91        92  \\\n",
       "0      0.491711 -0.358679 -1.394864  ...  0.407030 -0.713274  1.718921   \n",
       "1     -0.976886 -1.696476  0.926231  ... -0.855396 -2.048210 -1.265783   \n",
       "2     -0.625585 -0.979214  0.022012  ... -0.994192  0.661030 -0.010726   \n",
       "3     -0.926172 -0.494398 -0.363959  ...  0.020915 -0.800234 -0.047158   \n",
       "4     -0.680018 -0.205344  0.757280  ...  0.468555  1.204949  1.374184   \n",
       "...         ...       ...       ...  ...       ...       ...       ...   \n",
       "99995  1.211452 -0.462413 -1.330185  ...  1.651992  0.476358  0.441226   \n",
       "99996  1.232123  0.139006  0.684821  ... -0.207111  0.549576  2.155070   \n",
       "99997  0.403931 -0.946770 -0.127743  ...  0.245094 -0.188509 -0.425818   \n",
       "99998 -1.430341 -0.922263  1.865314  ... -0.027662  0.511552 -0.559474   \n",
       "99999 -0.396106 -0.368758 -1.187883  ... -1.746631  0.410091 -0.249438   \n",
       "\n",
       "             93        94        95        96        97        98        99  \n",
       "0      0.345061 -2.146100 -0.738093 -0.159594  0.520909  1.133554  0.585371  \n",
       "1      2.394670 -2.258890  1.229371  1.750581  0.139859  0.636832  0.385954  \n",
       "2     -0.468801 -0.605885  1.311313  0.339141 -0.211478 -0.717916  1.467792  \n",
       "3     -0.410241 -1.296799 -1.308874  0.467894  2.335768  0.298389 -0.855592  \n",
       "4      0.153511 -0.450651  0.877156  0.892300  1.151173 -0.230626  0.605697  \n",
       "...         ...       ...       ...       ...       ...       ...       ...  \n",
       "99995  0.704206 -0.270060  1.247129  1.247639 -0.143032  0.044815 -0.631406  \n",
       "99996  0.780223  0.281913  0.053108  0.433405  1.200421  1.526589 -0.269244  \n",
       "99997  0.425148 -1.733082  0.366996  0.278778 -0.457803 -1.976963  1.287677  \n",
       "99998 -2.757027 -1.092821  0.437376 -1.051840  1.263932  0.042789  0.152446  \n",
       "99999  0.042342 -0.597671  0.467246  1.492825  2.039514 -1.024533  2.571223  \n",
       "\n",
       "[100000 rows x 100 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df2._data.arrays)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Arithmetic with a scalar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check that this operation preserves the (non-)consolidation of the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res1 = df1 + 1\n",
    "len(res1._data.blocks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res2 = df2 + 1\n",
    "len(res2._data.arrays)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Timing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "28.3 ms ± 604 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1 + 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30.5 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df2 + 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Arithmetic with other dataframes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len((df1 + df1)._data.blocks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len((df2 + df2)._data.arrays)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30.8 ms ± 4.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1 + df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "33.5 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df2 + df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now check the performance difference with mixed dtypes (since the non-consolidated block could potentially be faster here, since the blocks don't need to be splitted to align for the operation). A few different cases:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# make a second frame where half of it are integer columns\n",
    "df1_b = df1.copy()\n",
    "df1_b.iloc[:, 50] = df1_b.iloc[:, 50].astype(int)\n",
    "df1_b.iloc[:, 51:] = df1_b.iloc[:, 51:].astype(int)\n",
    "df1_b._consolidate_inplace()\n",
    "len(df1_b._data.blocks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "32.7 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1 + df1_b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2_b = df2.copy()\n",
    "df2_b.iloc[:, 50] = df2_b.iloc[:, 50].astype(int)\n",
    "df2_b.iloc[:, 51:] = df2_b.iloc[:, 51:].astype(int)\n",
    "df2_b._consolidate_inplace()\n",
    "len(df2_b._data.arrays)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30.1 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df2 + df2_b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "res1 = df1 + df1_b\n",
    "res2 = df2 + df2_b\n",
    "pd.testing.assert_frame_equal(res1, res2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Second example with and integer dataframe more some random columns as floats:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "arr2 = np.random.randint(0, 1000, (100_000, 100))\n",
    "dfint1 = pd.DataFrame(arr2, manager=\"block\")  # the current default\n",
    "dfint1 = dfint1.copy()\n",
    "dfint2 = pd.DataFrame(arr2, manager=\"array\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "dfint1_a = dfint1.copy()\n",
    "dfint1_a.iloc[0, [10, 15, 26]] =  np.nan\n",
    "\n",
    "dfint1_b = dfint1.copy()\n",
    "dfint1_b.iloc[0, [13, 52]] =  np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "53.9 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit dfint1_a + dfint1_b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "dfint2_a = dfint2.copy()\n",
    "dfint2_a.iloc[0, [10, 15, 26]] =  np.nan\n",
    "\n",
    "dfint2_b = dfint2.copy()\n",
    "dfint2_b.iloc[0, [13, 52]] =  np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "34.5 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit dfint2_a + dfint2_b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, column-wise is significantly (not within error range) faster."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Column-wise reduction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "67.5 ms ± 2.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "24.1 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df2.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_series_equal(df1.sum(), df2.sum())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "59.4 ms ± 3.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1_orig.sum()  # the dataframe with the original array layout memory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fact that column-wise is faster here (and a lot) is a bit strange (also the fact that `df1_orig.sum()` is faster is strange, that might indicate a performance bug in the current implementation). Based on a quick profile with `%snakeviz`, it seems that the `_get_values` function in `nanops.py` is taking a larger portion for the block case (so something to investigate if we keep the block consolidation)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Row-wise reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we look at a row-wise reduction instead, the block-wise dataframe is of course faster:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "73.3 ms ± 4.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1.sum(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "189 ms ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df2.sum(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, if we wanted, this is probably also something that could be optimized, if we think that is worth it (it's certainly a much less common operation as column-wise reductions). \n",
    "\n",
    "A very crude appraoch for this specific case of sum gives the same result but is much faster:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9.81 ms ± 810 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit \n",
    "arrays = df2._iter_column_arrays()\n",
    "res = next(arrays)\n",
    "\n",
    "for a in arrays:\n",
    "    res = res + a\n",
    "\n",
    "res = pd.Series(res, index=df2.index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's ensure it gives the same result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "arrays = df2._iter_column_arrays()\n",
    "res = next(arrays)\n",
    "\n",
    "for a in arrays:\n",
    "    res = res + a\n",
    "\n",
    "res = pd.Series(res, index=df2.index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_series_equal(df2.sum(axis=1), res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Slicing rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "16.9 µs ± 4.13 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df1.iloc[0:1000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "33.5 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit df2.iloc[0:1000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here column-wise is a slower (as expected, since it needs to slice multiple arrays)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding a column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I am including the `_consolidate_inplace` here, as this is something that would otherwise often happen in a next operation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "246 ms ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit df = pd.DataFrame(arr, manager=\"block\")\n",
    "df[100] = 1.0\n",
    "df._consolidate_inplace()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "68.6 µs ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit df = pd.DataFrame(arr, manager=\"array\")\n",
    "df[100] = 1.0\n",
    "df._consolidate_inplace()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When including the consolidation, it is of course much cheaper to add a column for a non-consolidated dataframe. \n",
    "It's not necessarily fully fair comparison, as you typically only add a column once / a few times, and may do many operations on the dataframe afterwards (so the consolidation might pay off), but it's nevertheless a use case that would improve a lot with non-consolidated dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "124 µs ± 7.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit df = pd.DataFrame(arr, manager=\"block\")\n",
    "df[100] = 1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (dev)",
   "language": "python",
   "name": "dev"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
No results found