Created
September 2, 2020 22:42
-
-
Save mikofski/5fd38eda6f786c47a186c9d10b724eb9 to your computer and use it in GitHub Desktop.
Python tabular data formats face-off
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# different native tables in Python\n", | |
"This looks at 3 different ways to manipulate and use tabular data in Python:\n", | |
"\n", | |
"* [NumPy](https://docs.scipy.org/doc/numpy/index.html)\n", | |
"* [Pandas](https://pandas.pydata.org/)\n", | |
"* [XArray](https://xarray.pydata.org/en/stable/)\n", | |
"\n", | |
"## NumPy\n", | |
"NumPy [`ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) is already tabular, but it lacks column names, and requires the entire array to be the same datatype or [`dtype`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html) in NumPy syntax. NumPy has 2 additional tabular formats that include labels:\n", | |
"\n", | |
"* [Record Arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html)\n", | |
"* [Structured Arrays](https://docs.scipy.org/doc/numpy/user/basics.rec.html)\n", | |
"\n", | |
"A NumPy Structured Array is an `ndarray` that has a structured `dtype`. A structured `dtype` has labels and types for each column in the array. Each item in the array is a tuple corresponding to the structured `dtype`. Each _column_ of the array refers to all the items in the array with the label specified in the structured `dtype`. A NumPy Structured array is indexed using index notation with square brackets. A NumPy Record Array is an `ndarray` treats each column as an attribute, therefore uses dot-notation.\n", | |
"\n", | |
"## Pandas\n", | |
"Pandas dataframes are tabular records with column names and indices similar to a database or a spreadsheet\n", | |
"\n", | |
"## XArray\n", | |
"XArray is similar to NumPy except that it always has column labels" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Example of tabular data\n", | |
"We'll using a TMY3 file as an example of tabular data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"WindowsPath('C:/Users/mikm/Jupyter Notebooks')" | |
] | |
}, | |
"execution_count": 1, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import numpy as np\n", | |
"import pvlib\n", | |
"import pandas as pd\n", | |
"import xarray\n", | |
"import pathlib\n", | |
"#import os\n", | |
"\n", | |
"dirname = %pwd # dirname = os.path.dirname(__file__)\n", | |
"dirname = pathlib.Path(dirname)\n", | |
"dirname" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'USAF': 722287, 'Name': '\"ANNISTON METROPOLITAN AP\"', 'State': 'AL', 'TZ': -6.0, 'latitude': 33.583, 'longitude': -85.85, 'altitude': 186.0}\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>GHI</th>\n", | |
" <th>DNI</th>\n", | |
" <th>DHI</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>datetime</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <td>1991-01-01 01:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 02:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 03:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 04:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 05:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 06:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 07:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 08:00:00-06:00</td>\n", | |
" <td>61</td>\n", | |
" <td>33</td>\n", | |
" <td>57</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 09:00:00-06:00</td>\n", | |
" <td>80</td>\n", | |
" <td>0</td>\n", | |
" <td>80</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 10:00:00-06:00</td>\n", | |
" <td>232</td>\n", | |
" <td>66</td>\n", | |
" <td>204</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 11:00:00-06:00</td>\n", | |
" <td>303</td>\n", | |
" <td>59</td>\n", | |
" <td>273</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 12:00:00-06:00</td>\n", | |
" <td>332</td>\n", | |
" <td>233</td>\n", | |
" <td>204</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 13:00:00-06:00</td>\n", | |
" <td>209</td>\n", | |
" <td>0</td>\n", | |
" <td>209</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 14:00:00-06:00</td>\n", | |
" <td>181</td>\n", | |
" <td>0</td>\n", | |
" <td>181</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1991-01-01 15:00:00-06:00</td>\n", | |
" <td>134</td>\n", | |
" <td>0</td>\n", | |
" <td>134</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" GHI DNI DHI\n", | |
"datetime \n", | |
"1991-01-01 01:00:00-06:00 0 0 0\n", | |
"1991-01-01 02:00:00-06:00 0 0 0\n", | |
"1991-01-01 03:00:00-06:00 0 0 0\n", | |
"1991-01-01 04:00:00-06:00 0 0 0\n", | |
"1991-01-01 05:00:00-06:00 0 0 0\n", | |
"1991-01-01 06:00:00-06:00 0 0 0\n", | |
"1991-01-01 07:00:00-06:00 0 1 0\n", | |
"1991-01-01 08:00:00-06:00 61 33 57\n", | |
"1991-01-01 09:00:00-06:00 80 0 80\n", | |
"1991-01-01 10:00:00-06:00 232 66 204\n", | |
"1991-01-01 11:00:00-06:00 303 59 273\n", | |
"1991-01-01 12:00:00-06:00 332 233 204\n", | |
"1991-01-01 13:00:00-06:00 209 0 209\n", | |
"1991-01-01 14:00:00-06:00 181 0 181\n", | |
"1991-01-01 15:00:00-06:00 134 0 134" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# import a TMY3 as an example of tabular data\n", | |
"anniston_al = dirname / '..' / 'Downloads' / '722287TYA.CSV'\n", | |
"data, headers = pvlib.iotools.read_tmy3(str(anniston_al))\n", | |
"print(headers)\n", | |
"weather = data[['GHI', 'DNI', 'DHI']]\n", | |
"weather.head(15) # the data is output as a Pandas DataFrame" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# get the tabular data in each format\n", | |
"Pandas has easy methods to export a `DataFrame` into each of the tabular formats mentioned above." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array(['1991-01-01T07:00:00.000000000', '1991-01-01T08:00:00.000000000',\n", | |
" '1991-01-01T09:00:00.000000000', ...,\n", | |
" '2003-01-01T04:00:00.000000000', '2003-01-01T05:00:00.000000000',\n", | |
" '2003-01-01T06:00:00.000000000'], dtype='datetime64[ns]')" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# get the indices, which are numpy datetime64[ns]\n", | |
"weather_idx = weather.index.values\n", | |
"weather_idx" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[0, 0, 0],\n", | |
" [0, 0, 0],\n", | |
" [0, 0, 0],\n", | |
" ...,\n", | |
" [0, 0, 0],\n", | |
" [0, 0, 0],\n", | |
" [0, 0, 0]], dtype=int64)" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# use pandas DataFrame.to_numpy() method to return the data as a\n", | |
"# rectangular ndarray of the same dtype therefore the indices are\n", | |
"# lost b/c they are a different dtype so make sure to get them\n", | |
"# first e.g.: weather_idx in the prev cell\n", | |
"weather_np = weather.to_numpy()\n", | |
"weather_np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"rec.array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n", | |
" ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n", | |
" ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n", | |
" ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n", | |
" ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n", | |
" ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n", | |
" dtype=[('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')])" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# use pandas DataFrame.to_records() method to get data as a\n", | |
"# recarray, this keeps the timestamps, labels, and dtypes\n", | |
"weather_rec = weather.to_records()\n", | |
"weather_rec" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"dtype((numpy.record, [('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# dtype\n", | |
"weather_dtype = weather_rec.dtype\n", | |
"weather_dtype" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n", | |
" ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n", | |
" ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n", | |
" ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n", | |
" ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n", | |
" ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n", | |
" dtype=[('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')])" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# use the numpy array and indices to create a structured ndarray\n", | |
"weather_strc = np.asarray(\n", | |
" [(dt, d[0], d[1], d[2]) for dt, d in zip(weather_idx, weather_np)],\n", | |
" dtype=np.dtype([('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))\n", | |
"weather_strc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([('1991-01-01T07:00:00.000000000', 0, 0, 0),\n", | |
" ('1991-01-01T08:00:00.000000000', 0, 0, 0),\n", | |
" ('1991-01-01T09:00:00.000000000', 0, 0, 0), ...,\n", | |
" ('2003-01-01T04:00:00.000000000', 0, 0, 0),\n", | |
" ('2003-01-01T05:00:00.000000000', 0, 0, 0),\n", | |
" ('2003-01-01T06:00:00.000000000', 0, 0, 0)],\n", | |
" dtype=(numpy.record, [('datetime', '<M8[ns]'), ('GHI', '<i8'), ('DNI', '<i8'), ('DHI', '<i8')]))" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# use the record array to convert the data as a structured ndarray\n", | |
"weather_strc_rec = np.asarray(weather_rec, dtype=weather_dtype)\n", | |
"weather_strc_rec" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.Dataset>\n", | |
"Dimensions: (datetime: 8760)\n", | |
"Coordinates:\n", | |
" * datetime (datetime) object 662713200000000000 ... 1041400800000000000\n", | |
"Data variables:\n", | |
" GHI (datetime) int64 0 0 0 0 0 0 0 61 80 232 ... 70 42 9 0 0 0 0 0 0 0\n", | |
" DNI (datetime) int64 0 0 0 0 0 0 1 33 0 66 59 ... 0 0 4 0 0 0 0 0 0 0\n", | |
" DHI (datetime) int64 0 0 0 0 0 0 0 57 80 204 ... 70 42 8 0 0 0 0 0 0 0" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# data as an xarray\n", | |
"weather_xray = weather.to_xarray()\n", | |
"weather_xray" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# slicing columns\n", | |
"This section shows examples of how do you get a column in each format." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"datetime\n", | |
"1991-01-01 01:00:00-06:00 0\n", | |
"1991-01-01 02:00:00-06:00 0\n", | |
"1991-01-01 03:00:00-06:00 0\n", | |
"1991-01-01 04:00:00-06:00 0\n", | |
"1991-01-01 05:00:00-06:00 0\n", | |
" ..\n", | |
"2002-12-31 20:00:00-06:00 0\n", | |
"2002-12-31 21:00:00-06:00 0\n", | |
"2002-12-31 22:00:00-06:00 0\n", | |
"2002-12-31 23:00:00-06:00 0\n", | |
"2003-01-01 00:00:00-06:00 0\n", | |
"Name: DNI, Length: 8760, dtype: int64" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"datetime\n", | |
"1991-01-01 01:00:00-06:00 0\n", | |
"1991-01-01 02:00:00-06:00 0\n", | |
"1991-01-01 03:00:00-06:00 0\n", | |
"1991-01-01 04:00:00-06:00 0\n", | |
"1991-01-01 05:00:00-06:00 0\n", | |
" ..\n", | |
"2002-12-31 20:00:00-06:00 0\n", | |
"2002-12-31 21:00:00-06:00 0\n", | |
"2002-12-31 22:00:00-06:00 0\n", | |
"2002-12-31 23:00:00-06:00 0\n", | |
"2003-01-01 00:00:00-06:00 0\n", | |
"Name: DNI, Length: 8760, dtype: int64" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather.DNI" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array(['1991-01-01T07:00:00.000000000', '1991-01-01T08:00:00.000000000',\n", | |
" '1991-01-01T09:00:00.000000000', ...,\n", | |
" '2003-01-01T04:00:00.000000000', '2003-01-01T05:00:00.000000000',\n", | |
" '2003-01-01T06:00:00.000000000'], dtype='datetime64[ns]')" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_idx" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_np[:, 1] # recall {0: GHI, 1: DNI, 2: DHI}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_npT = weather_np.T\n", | |
"weather_npT[1]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_strc['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_strc_rec['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"True" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"np.allclose(weather_strc['DNI'], weather_npT[1])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`weather_strc.DNI` doesn't work, dot-notation only works for record arrays, but we can use dot-notation for both record arrays and xarrays. Structured arrays only use square brackets." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_rec['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_rec.DNI" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray 'DNI' (datetime: 8760)>\n", | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)\n", | |
"Coordinates:\n", | |
" * datetime (datetime) object 662713200000000000 ... 1041400800000000000" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_xray['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray 'DNI' (datetime: 8760)>\n", | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)\n", | |
"Coordinates:\n", | |
" * datetime (datetime) object 662713200000000000 ... 1041400800000000000" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_xray.DNI" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# how fast to get a column\n", | |
"According to this very small sample only 8760, the structured array is fastest by a lot\n", | |
"\n", | |
"|idx |np |npT |strc|rec |rec.|xray |xray.|pd |pd. |\n", | |
"|----|----|----|----|----|----|-----|-----|----|----|\n", | |
"|36.7| 335| 301|175 |3820|4870|30700|84700|2370|6150|\n", | |
"|21.5| 352| 201|158 |2610|3400|26400|68100|1930|5040|" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"21.5 ns ± 3.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_idx" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"352 ns ± 75.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_np[:, 1]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"201 ns ± 4.71 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_npT[1]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"158 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_strc['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"158 ns ± 5.35 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_strc_rec['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"2.61 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_rec['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"3.4 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_rec.DNI" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"26.4 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_xray['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"68.1 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_xray.DNI" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1.93 µs ± 306 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather['DNI']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"5.04 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather.DNI" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# slicing records\n", | |
"This section shows examples of how do you get a record in each format." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"GHI 0\n", | |
"DNI 0\n", | |
"DHI 0\n", | |
"Name: 1991-01-01 01:00:00-06:00, dtype: int64" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather.iloc[0] # pandas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>GHI</th>\n", | |
" <th>DNI</th>\n", | |
" <th>DHI</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>datetime</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <td>1991-01-01 07:00:00-06:00</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" GHI DNI DHI\n", | |
"datetime \n", | |
"1991-01-01 07:00:00-06:00 0 1 0" | |
] | |
}, | |
"execution_count": 34, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather.loc['1991-01-01T07:00:00.000000000'] # pandas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"numpy.datetime64('1991-01-01T07:00:00.000000000')" | |
] | |
}, | |
"execution_count": 35, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_idx[0] # numpy index" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 36, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_np[0] # numpy record" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 52, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_npT[:, 0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"('1991-01-01T07:00:00.000000000', 0, 0, 0)" | |
] | |
}, | |
"execution_count": 37, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_rec[0] # record array" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"('1991-01-01T07:00:00.000000000', 0, 0, 0)" | |
] | |
}, | |
"execution_count": 38, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_strc[0] # structure array" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"('1991-01-01T07:00:00.000000000', 0, 0, 0)" | |
] | |
}, | |
"execution_count": 39, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_strc_rec[0] # structure array" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.Dataset>\n", | |
"Dimensions: ()\n", | |
"Coordinates:\n", | |
" datetime datetime64[ns] 1991-01-01T07:00:00\n", | |
"Data variables:\n", | |
" GHI int64 0\n", | |
" DNI int64 0\n", | |
" DHI int64 0" | |
] | |
}, | |
"execution_count": 40, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_xray.isel(datetime=0) # xarray record" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.Dataset>\n", | |
"Dimensions: ()\n", | |
"Coordinates:\n", | |
" datetime datetime64[ns] 1991-01-01T07:00:00\n", | |
"Data variables:\n", | |
" GHI int64 0\n", | |
" DNI int64 0\n", | |
" DHI int64 0" | |
] | |
}, | |
"execution_count": 41, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weather_xray[dict(datetime=0)] # xarray record using key" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# how fast to get a record?\n", | |
"According to this very small sample only 8760, the structured array is still the fastest!\n", | |
"\n", | |
"|idx |np |npT |strc|rec |rec.|xray |xray.|pd |pd. |\n", | |
"|----|----|----|----|----|----|------|-----|------|----|\n", | |
"|172 | 204| -- |154 |861 | -- |213000| -- |300000| -- |\n", | |
"|138 | 195| -- |145 |964 | -- |222000| -- |133000| -- |" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"138 ns ± 8.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_idx[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"195 ns ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_np[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"242 ns ± 32.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_npT[:, 0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"145 ns ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_strc[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 45, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"163 ns ± 6.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_strc_rec[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"964 ns ± 62.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_rec[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 47, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"222 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_xray.isel(datetime=0)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 48, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"236 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather_xray[dict(datetime=0)]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"133 µs ± 5.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather.iloc[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 50, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"767 µs ± 17.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit weather.loc['1991-01-01T07:00:00.000000000']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This may interest you:
https://stackoverflow.com/a/54295522/9043286