mikofski · September 2, 2020 22:42 · adriesse · Oct 23, 2020
diff --git a/compare_tabular_data.ipynb b/compare_tabular_data.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# different native tables in Python\n",
    "This looks at 3 different ways to manipulate and use tabular data in Python:\n",
    "\n",
    "* [NumPy](https://docs.scipy.org/doc/numpy/index.html)\n",
    "* [Pandas](https://pandas.pydata.org/)\n",
    "* [XArray](https://xarray.pydata.org/en/stable/)\n",
    "\n",
    "## NumPy\n",
    "NumPy [`ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) is already tabular, but it lacks column names, and requires the entire array to be the same datatype or [`dtype`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html) in NumPy syntax. NumPy has 2 additional tabular formats that include labels:\n",
    "\n",
    "* [Record Arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html)\n",
    "* [Structured Arrays](https://docs.scipy.org/doc/numpy/user/basics.rec.html)\n",
    "\n",
    "A NumPy Structured Array is an `ndarray` that has a structured `dtype`. A structured `dtype` has labels and types for each column in the array.  Each item in the array is a tuple corresponding to the structured `dtype`. Each _column_ of the array refers to all the items in the array with the label specified in the structured `dtype`. A NumPy Structured array is indexed using index notation with square brackets. A NumPy Record Array is an `ndarray` treats each column as an attribute, therefore uses dot-notation.\n",
    "\n",
    "## Pandas\n",
    "Pandas dataframes are tabular records with column names and indices similar to a database or a spreadsheet\n",
    "\n",
    "## XArray\n",
    "XArray is similar to NumPy except that it always has column labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example of tabular data\n",
    "We'll using a TMY3 file as an example of tabular data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WindowsPath('C:/Users/mikm/Jupyter Notebooks')"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pvlib\n",
    "import pandas as pd\n",
    "import xarray\n",
    "import pathlib\n",
    "#import os\n",
    "\n",
    "dirname = %pwd  # dirname = os.path.dirname(__file__)\n",
    "dirname = pathlib.Path(dirname)\n",
    "dirname"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'USAF': 722287, 'Name': '\"ANNISTON METROPOLITAN AP\"', 'State': 'AL', 'TZ': -6.0, 'latitude': 33.583, 'longitude': -85.85, 'altitude': 186.0}\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>GHI</th>\n",
       "      <th>DNI</th>\n",
       "      <th>DHI</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>datetime</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1991-01-01 01:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 02:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 03:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 04:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 05:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 06:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 07:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 08:00:00-06:00</td>\n",
       "      <td>61</td>\n",
       "      <td>33</td>\n",
       "      <td>57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 09:00:00-06:00</td>\n",
       "      <td>80</td>\n",
       "      <td>0</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 10:00:00-06:00</td>\n",
       "      <td>232</td>\n",
       "      <td>66</td>\n",
       "      <td>204</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 11:00:00-06:00</td>\n",
       "      <td>303</td>\n",
       "      <td>59</td>\n",
       "      <td>273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 12:00:00-06:00</td>\n",
       "      <td>332</td>\n",
       "      <td>233</td>\n",
       "      <td>204</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 13:00:00-06:00</td>\n",
       "      <td>209</td>\n",
       "      <td>0</td>\n",
       "      <td>209</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 14:00:00-06:00</td>\n",
       "      <td>181</td>\n",
       "      <td>0</td>\n",
       "      <td>181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1991-01-01 15:00:00-06:00</td>\n",
       "      <td>134</td>\n",
       "      <td>0</td>\n",
       "      <td>134</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           GHI  DNI  DHI\n",
       "datetime                                \n",
       "1991-01-01 01:00:00-06:00    0    0    0\n",
       "1991-01-01 02:00:00-06:00    0    0    0\n",
       "1991-01-01 03:00:00-06:00    0    0    0\n",
       "1991-01-01 04:00:00-06:00    0    0    0\n",
       "1991-01-01 05:00:00-06:00    0    0    0\n",
       "1991-01-01 06:00:00-06:00    0    0    0\n",
       "1991-01-01 07:00:00-06:00    0    1    0\n",
       "1991-01-01 08:00:00-06:00   61   33   57\n",
       "1991-01-01 09:00:00-06:00   80    0   80\n",
       "1991-01-01 10:00:00-06:00  232   66  204\n",
       "1991-01-01 11:00:00-06:00  303   59  273\n",
       "1991-01-01 12:00:00-06:00  332  233  204\n",
       "1991-01-01 13:00:00-06:00  209    0  209\n",
       "1991-01-01 14:00:00-06:00  181    0  181\n",
       "1991-01-01 15:00:00-06:00  134    0  134"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# import a TMY3 as an example of tabular data\n",
    "anniston_al = dirname / '..' / 'Downloads' / '722287TYA.CSV'\n",
    "data, headers = pvlib.iotools.read_tmy3(str(anniston_al))\n",
    "print(headers)\n",
    "weather = data[['GHI', 'DNI', 'DHI']]\n",
    "weather.head(15)  # the data is output as a Pandas DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# get the tabular data in each format\n",
    "Pandas has easy methods to export a `DataFrame` into each of the tabular formats mentioned above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['1991-01-01T07:00:00.000000000', '1991-01-01T08:00:00.000000000',\n",
       "       '1991-01-01T09:00:00.000000000', ...,\n",
       "       '2003-01-01T04:00:00.000000000', '2003-01-01T05:00:00.000000000',\n",
       "       '2003-01-01T06:00:00.000000000'], dtype='datetime64[ns]')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the indices, which are numpy datetime64[ns]\n",
    "weather_idx = weather.index.values\n",
    "weather_idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0, 0, 0],\n",
       "       [0, 0, 0],\n",
       "       [0, 0, 0],\n",
       "       ...,\n",
       "       [0, 0, 0],\n",
       "       [0, 0, 0],\n",
       "       [0, 0, 0]], dtype=int64)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use pandas DataFrame.to_numpy() method to return the data as a\n",
    "# rectangular ndarray of the same dtype therefore the indices are\n",
    "# lost b/c they are a different dtype so make sure to get them\n",
    "# first e.g.: weather_idx in the prev cell\n",
    "weather_np = weather.to_numpy()\n",
    "weather_np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "rec.array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n",
       "           ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n",
       "           ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n",
       "           ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n",
       "           ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n",
       "           ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n",
       "          dtype=[('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use pandas DataFrame.to_records() method to get data as a\n",
    "# recarray, this keeps the timestamps, labels, and dtypes\n",
    "weather_rec = weather.to_records()\n",
    "weather_rec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype((numpy.record, [('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# dtype\n",
    "weather_dtype = weather_rec.dtype\n",
    "weather_dtype"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n",
       "       ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n",
       "       ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n",
       "       ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n",
       "       ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n",
       "       ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n",
       "      dtype=[('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the numpy array and indices to create a structured ndarray\n",
    "weather_strc = np.asarray(\n",
    "    [(dt, d[0], d[1], d[2]) for dt, d in zip(weather_idx, weather_np)],\n",
    "    dtype=np.dtype([('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))\n",
    "weather_strc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n",
       "       ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n",
       "       ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n",
       "       ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n",
       "       ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n",
       "       ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n",
       "      dtype=(numpy.record, [('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# use the record array to convert the data as a structured ndarray\n",
    "weather_strc_rec = np.asarray(weather_rec, dtype=weather_dtype)\n",
    "weather_strc_rec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:   (datetime: 8760)\n",
       "Coordinates:\n",
       "  * datetime  (datetime) object 662713200000000000 ... 1041400800000000000\n",
       "Data variables:\n",
       "    GHI       (datetime) int64 0 0 0 0 0 0 0 61 80 232 ... 70 42 9 0 0 0 0 0 0 0\n",
       "    DNI       (datetime) int64 0 0 0 0 0 0 1 33 0 66 59 ... 0 0 4 0 0 0 0 0 0 0\n",
       "    DHI       (datetime) int64 0 0 0 0 0 0 0 57 80 204 ... 70 42 8 0 0 0 0 0 0 0"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data as an xarray\n",
    "weather_xray = weather.to_xarray()\n",
    "weather_xray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# slicing columns\n",
    "This section shows examples of how do you get a column in each format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime\n",
       "1991-01-01 01:00:00-06:00    0\n",
       "1991-01-01 02:00:00-06:00    0\n",
       "1991-01-01 03:00:00-06:00    0\n",
       "1991-01-01 04:00:00-06:00    0\n",
       "1991-01-01 05:00:00-06:00    0\n",
       "                            ..\n",
       "2002-12-31 20:00:00-06:00    0\n",
       "2002-12-31 21:00:00-06:00    0\n",
       "2002-12-31 22:00:00-06:00    0\n",
       "2002-12-31 23:00:00-06:00    0\n",
       "2003-01-01 00:00:00-06:00    0\n",
       "Name: DNI, Length: 8760, dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime\n",
       "1991-01-01 01:00:00-06:00    0\n",
       "1991-01-01 02:00:00-06:00    0\n",
       "1991-01-01 03:00:00-06:00    0\n",
       "1991-01-01 04:00:00-06:00    0\n",
       "1991-01-01 05:00:00-06:00    0\n",
       "                            ..\n",
       "2002-12-31 20:00:00-06:00    0\n",
       "2002-12-31 21:00:00-06:00    0\n",
       "2002-12-31 22:00:00-06:00    0\n",
       "2002-12-31 23:00:00-06:00    0\n",
       "2003-01-01 00:00:00-06:00    0\n",
       "Name: DNI, Length: 8760, dtype: int64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather.DNI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['1991-01-01T07:00:00.000000000', '1991-01-01T08:00:00.000000000',\n",
       "       '1991-01-01T09:00:00.000000000', ...,\n",
       "       '2003-01-01T04:00:00.000000000', '2003-01-01T05:00:00.000000000',\n",
       "       '2003-01-01T06:00:00.000000000'], dtype='datetime64[ns]')"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_np[:, 1]  # recall {0: GHI, 1: DNI, 2: DHI}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_npT = weather_np.T\n",
    "weather_npT[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_strc['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_strc_rec['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.allclose(weather_strc['DNI'], weather_npT[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`weather_strc.DNI` doesn't work, dot-notation only works for record arrays, but we can use dot-notation for both record arrays and xarrays. Structured arrays only use square brackets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_rec['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_rec.DNI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'DNI' (datetime: 8760)>\n",
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)\n",
       "Coordinates:\n",
       "  * datetime  (datetime) object 662713200000000000 ... 1041400800000000000"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_xray['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'DNI' (datetime: 8760)>\n",
       "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)\n",
       "Coordinates:\n",
       "  * datetime  (datetime) object 662713200000000000 ... 1041400800000000000"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_xray.DNI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# how fast to get a column\n",
    "According to this very small sample only 8760, the structured array is fastest by a lot\n",
    "\n",
    "|idx |np  |npT |strc|rec |rec.|xray |xray.|pd  |pd. |\n",
    "|----|----|----|----|----|----|-----|-----|----|----|\n",
    "|36.7| 335| 301|175 |3820|4870|30700|84700|2370|6150|\n",
    "|21.5| 352| 201|158 |2610|3400|26400|68100|1930|5040|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "21.5 ns ± 3.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "352 ns ± 75.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_np[:, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "201 ns ± 4.71 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_npT[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "158 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_strc['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "158 ns ± 5.35 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_strc_rec['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.61 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_rec['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3.4 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_rec.DNI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "26.4 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_xray['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "68.1 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_xray.DNI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.93 µs ± 306 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather['DNI']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.04 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather.DNI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# slicing records\n",
    "This section shows examples of how do you get a record in each format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GHI    0\n",
       "DNI    0\n",
       "DHI    0\n",
       "Name: 1991-01-01 01:00:00-06:00, dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather.iloc[0]  # pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>GHI</th>\n",
       "      <th>DNI</th>\n",
       "      <th>DHI</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>datetime</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1991-01-01 07:00:00-06:00</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           GHI  DNI  DHI\n",
       "datetime                                \n",
       "1991-01-01 07:00:00-06:00    0    1    0"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather.loc['1991-01-01T07:00:00.000000000']  # pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.datetime64('1991-01-01T07:00:00.000000000')"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_idx[0]  # numpy index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_np[0]  # numpy record"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0], dtype=int64)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_npT[:, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('1991-01-01T07:00:00.000000000', 0, 0, 0)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_rec[0]  # record array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('1991-01-01T07:00:00.000000000', 0, 0, 0)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_strc[0]  # structure array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('1991-01-01T07:00:00.000000000', 0, 0, 0)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_strc_rec[0]  # structure array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:   ()\n",
       "Coordinates:\n",
       "    datetime  datetime64[ns] 1991-01-01T07:00:00\n",
       "Data variables:\n",
       "    GHI       int64 0\n",
       "    DNI       int64 0\n",
       "    DHI       int64 0"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_xray.isel(datetime=0)  # xarray record"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:   ()\n",
       "Coordinates:\n",
       "    datetime  datetime64[ns] 1991-01-01T07:00:00\n",
       "Data variables:\n",
       "    GHI       int64 0\n",
       "    DNI       int64 0\n",
       "    DHI       int64 0"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_xray[dict(datetime=0)]  # xarray record using key"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# how fast to get a record?\n",
    "According to this very small sample only 8760, the structured array is still the fastest!\n",
    "\n",
    "|idx |np  |npT |strc|rec |rec.|xray  |xray.|pd    |pd. |\n",
    "|----|----|----|----|----|----|------|-----|------|----|\n",
    "|172 | 204| -- |154 |861 | -- |213000| --  |300000| -- |\n",
    "|138 | 195| -- |145 |964 | -- |222000| --  |133000| -- |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "138 ns ± 8.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_idx[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "195 ns ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_np[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "242 ns ± 32.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_npT[:, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "145 ns ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_strc[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "163 ns ± 6.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_strc_rec[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "964 ns ± 62.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_rec[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "222 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_xray.isel(datetime=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "236 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather_xray[dict(datetime=0)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "133 µs ± 5.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather.iloc[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "767 µs ± 17.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit weather.loc['1991-01-01T07:00:00.000000000']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }