Skip to content

Instantly share code, notes, and snippets.

@mikofski
Created September 2, 2020 22:42
Show Gist options
  • Save mikofski/5fd38eda6f786c47a186c9d10b724eb9 to your computer and use it in GitHub Desktop.
Save mikofski/5fd38eda6f786c47a186c9d10b724eb9 to your computer and use it in GitHub Desktop.
Python tabular data formats face-off
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# different native tables in Python\n",
"This looks at 3 different ways to manipulate and use tabular data in Python:\n",
"\n",
"* [NumPy](https://docs.scipy.org/doc/numpy/index.html)\n",
"* [Pandas](https://pandas.pydata.org/)\n",
"* [XArray](https://xarray.pydata.org/en/stable/)\n",
"\n",
"## NumPy\n",
"NumPy [`ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) is already tabular, but it lacks column names, and requires the entire array to be the same datatype or [`dtype`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html) in NumPy syntax. NumPy has 2 additional tabular formats that include labels:\n",
"\n",
"* [Record Arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html)\n",
"* [Structured Arrays](https://docs.scipy.org/doc/numpy/user/basics.rec.html)\n",
"\n",
"A NumPy Structured Array is an `ndarray` that has a structured `dtype`. A structured `dtype` has labels and types for each column in the array. Each item in the array is a tuple corresponding to the structured `dtype`. Each _column_ of the array refers to all the items in the array with the label specified in the structured `dtype`. A NumPy Structured array is indexed using index notation with square brackets. A NumPy Record Array is an `ndarray` treats each column as an attribute, therefore uses dot-notation.\n",
"\n",
"## Pandas\n",
"Pandas dataframes are tabular records with column names and indices similar to a database or a spreadsheet\n",
"\n",
"## XArray\n",
"XArray is similar to NumPy except that it always has column labels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example of tabular data\n",
"We'll using a TMY3 file as an example of tabular data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"WindowsPath('C:/Users/mikm/Jupyter Notebooks')"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"import pvlib\n",
"import pandas as pd\n",
"import xarray\n",
"import pathlib\n",
"#import os\n",
"\n",
"dirname = %pwd # dirname = os.path.dirname(__file__)\n",
"dirname = pathlib.Path(dirname)\n",
"dirname"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'USAF': 722287, 'Name': '\"ANNISTON METROPOLITAN AP\"', 'State': 'AL', 'TZ': -6.0, 'latitude': 33.583, 'longitude': -85.85, 'altitude': 186.0}\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>GHI</th>\n",
" <th>DNI</th>\n",
" <th>DHI</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>1991-01-01 01:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 02:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 03:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 04:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 05:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 06:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 07:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 08:00:00-06:00</td>\n",
" <td>61</td>\n",
" <td>33</td>\n",
" <td>57</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 09:00:00-06:00</td>\n",
" <td>80</td>\n",
" <td>0</td>\n",
" <td>80</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 10:00:00-06:00</td>\n",
" <td>232</td>\n",
" <td>66</td>\n",
" <td>204</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 11:00:00-06:00</td>\n",
" <td>303</td>\n",
" <td>59</td>\n",
" <td>273</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 12:00:00-06:00</td>\n",
" <td>332</td>\n",
" <td>233</td>\n",
" <td>204</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 13:00:00-06:00</td>\n",
" <td>209</td>\n",
" <td>0</td>\n",
" <td>209</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 14:00:00-06:00</td>\n",
" <td>181</td>\n",
" <td>0</td>\n",
" <td>181</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1991-01-01 15:00:00-06:00</td>\n",
" <td>134</td>\n",
" <td>0</td>\n",
" <td>134</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" GHI DNI DHI\n",
"datetime \n",
"1991-01-01 01:00:00-06:00 0 0 0\n",
"1991-01-01 02:00:00-06:00 0 0 0\n",
"1991-01-01 03:00:00-06:00 0 0 0\n",
"1991-01-01 04:00:00-06:00 0 0 0\n",
"1991-01-01 05:00:00-06:00 0 0 0\n",
"1991-01-01 06:00:00-06:00 0 0 0\n",
"1991-01-01 07:00:00-06:00 0 1 0\n",
"1991-01-01 08:00:00-06:00 61 33 57\n",
"1991-01-01 09:00:00-06:00 80 0 80\n",
"1991-01-01 10:00:00-06:00 232 66 204\n",
"1991-01-01 11:00:00-06:00 303 59 273\n",
"1991-01-01 12:00:00-06:00 332 233 204\n",
"1991-01-01 13:00:00-06:00 209 0 209\n",
"1991-01-01 14:00:00-06:00 181 0 181\n",
"1991-01-01 15:00:00-06:00 134 0 134"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# import a TMY3 as an example of tabular data\n",
"anniston_al = dirname / '..' / 'Downloads' / '722287TYA.CSV'\n",
"data, headers = pvlib.iotools.read_tmy3(str(anniston_al))\n",
"print(headers)\n",
"weather = data[['GHI', 'DNI', 'DHI']]\n",
"weather.head(15) # the data is output as a Pandas DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# get the tabular data in each format\n",
"Pandas has easy methods to export a `DataFrame` into each of the tabular formats mentioned above."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['1991-01-01T07:00:00.000000000', '1991-01-01T08:00:00.000000000',\n",
" '1991-01-01T09:00:00.000000000', ...,\n",
" '2003-01-01T04:00:00.000000000', '2003-01-01T05:00:00.000000000',\n",
" '2003-01-01T06:00:00.000000000'], dtype='datetime64[ns]')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get the indices, which are numpy datetime64[ns]\n",
"weather_idx = weather.index.values\n",
"weather_idx"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 0, 0],\n",
" [0, 0, 0],\n",
" [0, 0, 0],\n",
" ...,\n",
" [0, 0, 0],\n",
" [0, 0, 0],\n",
" [0, 0, 0]], dtype=int64)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use pandas DataFrame.to_numpy() method to return the data as a\n",
"# rectangular ndarray of the same dtype therefore the indices are\n",
"# lost b/c they are a different dtype so make sure to get them\n",
"# first e.g.: weather_idx in the prev cell\n",
"weather_np = weather.to_numpy()\n",
"weather_np"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"rec.array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n",
" ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n",
" ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n",
" ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n",
" ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n",
" ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n",
" dtype=[('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use pandas DataFrame.to_records() method to get data as a\n",
"# recarray, this keeps the timestamps, labels, and dtypes\n",
"weather_rec = weather.to_records()\n",
"weather_rec"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype((numpy.record, [('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# dtype\n",
"weather_dtype = weather_rec.dtype\n",
"weather_dtype"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n",
" ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n",
" ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n",
" ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n",
" ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n",
" ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n",
" dtype=[('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use the numpy array and indices to create a structured ndarray\n",
"weather_strc = np.asarray(\n",
" [(dt, d[0], d[1], d[2]) for dt, d in zip(weather_idx, weather_np)],\n",
" dtype=np.dtype([('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))\n",
"weather_strc"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n",
" ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n",
" ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n",
" ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n",
" ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n",
" ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n",
" dtype=(numpy.record, [('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use the record array to convert the data as a structured ndarray\n",
"weather_strc_rec = np.asarray(weather_rec, dtype=weather_dtype)\n",
"weather_strc_rec"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (datetime: 8760)\n",
"Coordinates:\n",
" * datetime (datetime) object 662713200000000000 ... 1041400800000000000\n",
"Data variables:\n",
" GHI (datetime) int64 0 0 0 0 0 0 0 61 80 232 ... 70 42 9 0 0 0 0 0 0 0\n",
" DNI (datetime) int64 0 0 0 0 0 0 1 33 0 66 59 ... 0 0 4 0 0 0 0 0 0 0\n",
" DHI (datetime) int64 0 0 0 0 0 0 0 57 80 204 ... 70 42 8 0 0 0 0 0 0 0"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# data as an xarray\n",
"weather_xray = weather.to_xarray()\n",
"weather_xray"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# slicing columns\n",
"This section shows examples of how do you get a column in each format."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"datetime\n",
"1991-01-01 01:00:00-06:00 0\n",
"1991-01-01 02:00:00-06:00 0\n",
"1991-01-01 03:00:00-06:00 0\n",
"1991-01-01 04:00:00-06:00 0\n",
"1991-01-01 05:00:00-06:00 0\n",
" ..\n",
"2002-12-31 20:00:00-06:00 0\n",
"2002-12-31 21:00:00-06:00 0\n",
"2002-12-31 22:00:00-06:00 0\n",
"2002-12-31 23:00:00-06:00 0\n",
"2003-01-01 00:00:00-06:00 0\n",
"Name: DNI, Length: 8760, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"datetime\n",
"1991-01-01 01:00:00-06:00 0\n",
"1991-01-01 02:00:00-06:00 0\n",
"1991-01-01 03:00:00-06:00 0\n",
"1991-01-01 04:00:00-06:00 0\n",
"1991-01-01 05:00:00-06:00 0\n",
" ..\n",
"2002-12-31 20:00:00-06:00 0\n",
"2002-12-31 21:00:00-06:00 0\n",
"2002-12-31 22:00:00-06:00 0\n",
"2002-12-31 23:00:00-06:00 0\n",
"2003-01-01 00:00:00-06:00 0\n",
"Name: DNI, Length: 8760, dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather.DNI"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['1991-01-01T07:00:00.000000000', '1991-01-01T08:00:00.000000000',\n",
" '1991-01-01T09:00:00.000000000', ...,\n",
" '2003-01-01T04:00:00.000000000', '2003-01-01T05:00:00.000000000',\n",
" '2003-01-01T06:00:00.000000000'], dtype='datetime64[ns]')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_idx"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_np[:, 1] # recall {0: GHI, 1: DNI, 2: DHI}"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_npT = weather_np.T\n",
"weather_npT[1]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_strc['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_strc_rec['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.allclose(weather_strc['DNI'], weather_npT[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`weather_strc.DNI` doesn't work, dot-notation only works for record arrays, but we can use dot-notation for both record arrays and xarrays. Structured arrays only use square brackets."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_rec['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_rec.DNI"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'DNI' (datetime: 8760)>\n",
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)\n",
"Coordinates:\n",
" * datetime (datetime) object 662713200000000000 ... 1041400800000000000"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_xray['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'DNI' (datetime: 8760)>\n",
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)\n",
"Coordinates:\n",
" * datetime (datetime) object 662713200000000000 ... 1041400800000000000"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_xray.DNI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# how fast to get a column\n",
"According to this very small sample only 8760, the structured array is fastest by a lot\n",
"\n",
"|idx |np |npT |strc|rec |rec.|xray |xray.|pd |pd. |\n",
"|----|----|----|----|----|----|-----|-----|----|----|\n",
"|36.7| 335| 301|175 |3820|4870|30700|84700|2370|6150|\n",
"|21.5| 352| 201|158 |2610|3400|26400|68100|1930|5040|"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"21.5 ns ± 3.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_idx"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"352 ns ± 75.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_np[:, 1]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"201 ns ± 4.71 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_npT[1]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"158 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_strc['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"158 ns ± 5.35 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_strc_rec['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.61 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"%timeit weather_rec['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.4 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"%timeit weather_rec.DNI"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"26.4 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%timeit weather_xray['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"68.1 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%timeit weather_xray.DNI"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.93 µs ± 306 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"%timeit weather['DNI']"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5.04 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"%timeit weather.DNI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# slicing records\n",
"This section shows examples of how do you get a record in each format."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"GHI 0\n",
"DNI 0\n",
"DHI 0\n",
"Name: 1991-01-01 01:00:00-06:00, dtype: int64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather.iloc[0] # pandas"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>GHI</th>\n",
" <th>DNI</th>\n",
" <th>DHI</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>1991-01-01 07:00:00-06:00</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" GHI DNI DHI\n",
"datetime \n",
"1991-01-01 07:00:00-06:00 0 1 0"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather.loc['1991-01-01T07:00:00.000000000'] # pandas"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"numpy.datetime64('1991-01-01T07:00:00.000000000')"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_idx[0] # numpy index"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0], dtype=int64)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_np[0] # numpy record"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0], dtype=int64)"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_npT[:, 0]"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('1991-01-01T07:00:00.000000000', 0, 0, 0)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_rec[0] # record array"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('1991-01-01T07:00:00.000000000', 0, 0, 0)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_strc[0] # structure array"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('1991-01-01T07:00:00.000000000', 0, 0, 0)"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_strc_rec[0] # structure array"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: ()\n",
"Coordinates:\n",
" datetime datetime64[ns] 1991-01-01T07:00:00\n",
"Data variables:\n",
" GHI int64 0\n",
" DNI int64 0\n",
" DHI int64 0"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_xray.isel(datetime=0) # xarray record"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: ()\n",
"Coordinates:\n",
" datetime datetime64[ns] 1991-01-01T07:00:00\n",
"Data variables:\n",
" GHI int64 0\n",
" DNI int64 0\n",
" DHI int64 0"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weather_xray[dict(datetime=0)] # xarray record using key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# how fast to get a record?\n",
"According to this very small sample only 8760, the structured array is still the fastest!\n",
"\n",
"|idx |np |npT |strc|rec |rec.|xray |xray.|pd |pd. |\n",
"|----|----|----|----|----|----|------|-----|------|----|\n",
"|172 | 204| -- |154 |861 | -- |213000| -- |300000| -- |\n",
"|138 | 195| -- |145 |964 | -- |222000| -- |133000| -- |"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"138 ns ± 8.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_idx[0]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"195 ns ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_np[0]"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"242 ns ± 32.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_npT[:, 0]"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"145 ns ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_strc[0]"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"163 ns ± 6.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_strc_rec[0]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"964 ns ± 62.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
"source": [
"%timeit weather_rec[0]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"222 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%timeit weather_xray.isel(datetime=0)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"236 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%timeit weather_xray[dict(datetime=0)]"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"133 µs ± 5.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%timeit weather.iloc[0]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"767 µs ± 17.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%timeit weather.loc['1991-01-01T07:00:00.000000000']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@adriesse
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment