Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save AntoineDao/4ec33ad38e16971c9eeed1d19d7bb511 to your computer and use it in GitHub Desktop.
Save AntoineDao/4ec33ad38e16971c9eeed1d19d7bb511 to your computer and use it in GitHub Desktop.
Energy Plus Result Parsing Explorattion
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Energy Plus Results Parsing with Honeybee-Energy and Pandas\n",
"\n",
"## Overview\n",
"This notebook explores the ways in which we can extract data from the energyplus sqlite results database to produce CSV, Parquet, Avro etc... encodable files.\n",
"\n",
"There are three main steps carried out to generate such files from energyplus results:\n",
"\n",
"1. Query the SQLite database to retrieve the data requested\n",
"2. Manipulate the data to reshape it into a flatter/more useful format (this should probably be done using Pandas for speed + ease of use)\n",
"3. Dump the resulting dataframe as a CSV/parquet/avro/etc... file. This should also be done using Pandas as it has a [suite of writer function](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)) \n",
"\n",
"\n",
"## Methods tested\n",
"I have tested two methods for extracting/manipulating data into a usable shape in a pandas dataframe:\n",
"\n",
"1. Leverage existing honeybee-energy query method to generate data collections that are then passed into a Dataframe using Ladybug-Pandas\n",
"2. Write SQL query that outputs its results directly in a Pandas DataFrame and manipulate it from there.\n",
"\n",
"\n",
"The short conclusion from this is that option 2. is far quicker to run that option 1. I think this makes for an acceptable solution if we are ok having to re-write a bit of logic to get metadata from the SQLite database or add more context to the resulting dataframe.\n",
"\n",
"I suspect most of the performance difference between 1 and 2 come from the following facts:\n",
"1. Honeybee-energy spends a lot of resource translating the SQL response from lists to data collections\n",
"2. Translating the data collections back into ladybug-pandas Dataframes takes some more time\n",
"3. LadybugPandas dataframes hold tons of unit properties which are helpful when exploring/converting the data but must be converted to a normal \"numeric\" array to perform group by operations (more inneficency)\n",
"\n",
"I will first demo the two methods with their respective functions briefly documented. The section after that will walk you through each method step by step."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Processing Function Comparison\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Honeybee-Energy + Ladybug-Pandas"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from honeybee_energy.result.sql import SQLiteResult\n",
"import ladybug_pandas as lbp\n",
"import pandas as pd\n",
"\n",
"def ladybug_native(sqlite_file='eplusout.sql', outputs=['Zone Mean Radiant Temperature']):\n",
" \n",
" # Get a SQL db instance\n",
" db = SQLiteResult('eplusout.sql')\n",
" db.available_outputs\n",
" \n",
" # Extract Data Collections from db\n",
" if len(outputs) > 1:\n",
" data_collections = db.data_collections_by_output_name(tuple(outputs))\n",
" else:\n",
" data_collections = db.data_collections_by_output_name(outputs[0])\n",
" \n",
" # Turn datacollections into a single Pandas DataFrame\n",
" df = lbp.DataFrame(\n",
" data_collections=data_collections,\n",
" name_key='type',\n",
" axis=0,\n",
" populate_metadata=True,\n",
" )\n",
"\n",
" # Add a datetime column\n",
" df['datetime'] = df.index\n",
"\n",
" # Turn all \"ladybug pandas columns\" into numeric columns for Group By operation\n",
" for output in outputs:\n",
" df[output] = pd.to_numeric(df[output])\n",
" \n",
" # \"Unpivot\" (or melt) the dataframe to turn the \"Zone\" and \"System\" columns into rows\n",
" id_vars = ['datetime']\n",
" for output in outputs:\n",
" id_vars.append(output)\n",
" \n",
" value_vars = []\n",
" \n",
" if 'System' in df.columns:\n",
" value_vars.append('System')\n",
" if 'Zone' in df.columns:\n",
" value_vars.append('Zone')\n",
" \n",
" melted = df.melt(\n",
" id_vars=id_vars,\n",
" value_vars=value_vars,\n",
" var_name='group',\n",
" value_name='element_name'\n",
" )\n",
" \n",
" # Remove null rows by grouping by group, element_name, and datetime\n",
" flattened_df = melted.groupby(by=['group', 'element_name', 'datetime']).mean().reset_index()\n",
"\n",
" # Split the flattened dataframe into 2 dataframes (split by \"group\" so \"System\" and \"Zone\")\n",
" group_dfs = [x.reset_index(drop=True).dropna(axis='columns', how='all') for _, x in flattened_df.groupby(by='group')]\n",
"\n",
" return group_dfs"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>datetime</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 00:00:00</td>\n",
" <td>11.233115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 01:00:00</td>\n",
" <td>24.763651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 02:00:00</td>\n",
" <td>13.552033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 03:00:00</td>\n",
" <td>20.303841</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 04:00:00</td>\n",
" <td>12.699779</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name datetime \\\n",
"0 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 00:00:00 \n",
"1 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 01:00:00 \n",
"2 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 02:00:00 \n",
"3 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 03:00:00 \n",
"4 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 04:00:00 \n",
"\n",
" Zone Mean Radiant Temperature \n",
"0 11.233115 \n",
"1 24.763651 \n",
"2 13.552033 \n",
"3 20.303841 \n",
"4 12.699779 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dfs = ladybug_native()\n",
"\n",
"for group_df in dfs:\n",
" display(group_df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Native "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import sqlite3\n",
"import pandas as pd\n",
"\n",
"def pandas_native(sqlite_file='eplusout.sql', reporting_frequency='Hourly', outputs=['Surface Inside Face Temperature']):\n",
" \n",
" # Connect to the SQLite database file\n",
" conn = sqlite3.connect(sqlite_file)\n",
" \n",
" outputs = ','.join([f'\"{output}\"' for output in outputs])\n",
" \n",
" \n",
" # Query SQL database and return joined data from Time, ReportDataDictionary and ReportData in one table\n",
" sql_query = f\"\"\"\n",
" SELECT \n",
" Time.Year as year,\n",
" Time.Month as month,\n",
" Time.Day as day,\n",
" Time.Hour as hour,\n",
" Time.Minute as minute,\n",
" Time.Dst as second,\n",
" Time.DayType as day_of_week,\n",
" ReportDataDictionary.IndexGroup as 'group',\n",
" ReportDataDictionary.KeyValue as 'element_name',\n",
" ReportDataDictionary.Name as 'data_type',\n",
" ReportDataDictionary.ReportingFrequency as 'frequency',\n",
" ReportData.Value as 'value'\n",
" FROM ReportData\n",
" INNER JOIN ReportDataDictionary on ReportDataDictionary.ReportDataDictionaryIndex = ReportData.ReportDataIndex\n",
" INNER JOIN Time on ReportData.TimeIndex = Time.TimeIndex\n",
" WHERE \n",
" ReportDataDictionary.ReportingFrequency = '{reporting_frequency}'\n",
" AND\n",
" ReportDataDictionary.Name in ({outputs});\n",
" \"\"\"\n",
" \n",
" df = pd.read_sql_query(sql_query, conn)\n",
"\n",
" # Pivot the table (turn the \"data_type\" column values into column names)\n",
" pivoted = df.pivot(\n",
" columns='data_type',\n",
" values='value'\n",
" )\n",
"\n",
" # Merge the pivoted DF back with the original one to get grouping information (group, element name, frequency, datetime)\n",
" merged = pd.merge(df, pivoted, left_index=True, right_index=True).drop('value', axis=1)\n",
"\n",
" # remove null values in columns by grouping by the columns below\n",
" flattened_df = merged.groupby(by=['group', 'element_name', 'frequency', 'year', 'month', 'day', 'hour', 'minute', 'second', 'day_of_week']).mean().reset_index()\n",
"\n",
" # Split the dataframe into multiple smalle dataframes by 'group' (ie: ReportDataDictionary.IndexGroup in the SQLite db)\n",
" # We also perform some quick cleanup operation to remove columns that have only null values\n",
" group_dfs = [x.reset_index(drop=True).dropna(axis='columns', how='all') for _, x in flattened_df.groupby(by='group')]\n",
" \n",
" return group_dfs"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>frequency</th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>hour</th>\n",
" <th>minute</th>\n",
" <th>second</th>\n",
" <th>day_of_week</th>\n",
" <th>Surface Inside Face Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_00092C84-5E5A-4CF7-95D6-B85504CACDDE</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>-2.407988</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_002FE07C-D6CB-4225-8A57-E542AD8C41EB</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>7.598665</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_003460E3-71EC-4ACA-8CBA-F63BDDC69C41</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>1.374292</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_01220E99-168C-49B0-9541-3492DBB6821B</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>12.515119</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_0253129F-D1C3-4C10-B5BD-543C37AD9BD2</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>10.102524</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name frequency year month \\\n",
"0 Zone APERTURE_00092C84-5E5A-4CF7-95D6-B85504CACDDE Hourly 2006 1 \n",
"1 Zone APERTURE_002FE07C-D6CB-4225-8A57-E542AD8C41EB Hourly 2006 1 \n",
"2 Zone APERTURE_003460E3-71EC-4ACA-8CBA-F63BDDC69C41 Hourly 2006 1 \n",
"3 Zone APERTURE_01220E99-168C-49B0-9541-3492DBB6821B Hourly 2006 1 \n",
"4 Zone APERTURE_0253129F-D1C3-4C10-B5BD-543C37AD9BD2 Hourly 2006 1 \n",
"\n",
" day hour minute second day_of_week Surface Inside Face Temperature \n",
"0 1 3 0 0 Sunday -2.407988 \n",
"1 1 2 0 0 Sunday 7.598665 \n",
"2 1 3 0 0 Sunday 1.374292 \n",
"3 1 3 0 0 Sunday 12.515119 \n",
"4 1 3 0 0 Sunday 10.102524 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dfs = pandas_native()\n",
"\n",
"for group_df in dfs:\n",
" display(group_df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Performance Comparison\n",
"\n",
"Using the `timeit` function from Jupyter we can compare the performance (in terms of speed) of both methods. In this case using the Native Pandas method runs faster than the Honeybee-Energy + Ladybug-Pandas method by a factor of 1000."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"22.6 s ± 776 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n"
]
}
],
"source": [
"# Honeybee-Energy + Ladybug Pandas\n",
"%timeit -r3 ladybug_native(outputs=['Zone Mean Radiant Temperature'])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"27.3 ms ± 2.08 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)\n"
]
}
],
"source": [
"# Pandas Native\n",
"%timeit -r3 pandas_native(outputs=['Zone Mean Radiant Temperature'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Detailed Breakdown per Method\n",
"\n",
"### Honeybee-Energy + Ladybug Pandas"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Surface Outside Face Temperature',\n",
" 'DistrictCooling:Facility',\n",
" 'Zone Air Temperature',\n",
" 'Surface Inside Face Temperature',\n",
" 'Electricity:Facility',\n",
" 'Zone Mean Radiant Temperature',\n",
" 'Zone Air Relative Humidity',\n",
" 'DistrictHeating:Facility',\n",
" 'Site Outdoor Air Drybulb Temperature')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from honeybee_energy.result.sql import SQLiteResult\n",
"import ladybug_pandas as lbp\n",
"\n",
"# Load SQL database\n",
"res = SQLiteResult('eplusout.sql')\n",
"\n",
"# Check available outputs\n",
"res.available_outputs"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 253 data_collections\n"
]
}
],
"source": [
"# Load data collections from database\n",
"outputs=['Zone Mean Radiant Temperature']\n",
"\n",
"data_collections = res.data_collections_by_output_name(outputs[0])\n",
"\n",
"print(f\"There are {len(data_collections)} data_collections\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Performance Note\n",
"\n",
"Loading data collections from the database is pretty slow for some reason. I suspect it has more to do with generating the data collections rather than the actual SQL query.\n",
"\n",
"As you can see from the outputs below it takes roughly 7 seconds to load one result type from the 2GB sqlite results database."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6.75 s ± 14.8 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n"
]
}
],
"source": [
"%timeit -r3 res.data_collections_by_output_name(outputs[0])"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" <th>type</th>\n",
" <th>Zone</th>\n",
" <th>datetime</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2017-01-01 00:00:00</th>\n",
" <td>11.233115</td>\n",
" <td>Zone Mean Radiant Temperature</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 00:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017-01-01 01:00:00</th>\n",
" <td>24.763651</td>\n",
" <td>Zone Mean Radiant Temperature</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 01:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017-01-01 02:00:00</th>\n",
" <td>13.552033</td>\n",
" <td>Zone Mean Radiant Temperature</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 02:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017-01-01 03:00:00</th>\n",
" <td>20.303841</td>\n",
" <td>Zone Mean Radiant Temperature</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 03:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017-01-01 04:00:00</th>\n",
" <td>12.699779</td>\n",
" <td>Zone Mean Radiant Temperature</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 04:00:00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Zone Mean Radiant Temperature \\\n",
"2017-01-01 00:00:00 11.233115 \n",
"2017-01-01 01:00:00 24.763651 \n",
"2017-01-01 02:00:00 13.552033 \n",
"2017-01-01 03:00:00 20.303841 \n",
"2017-01-01 04:00:00 12.699779 \n",
"\n",
" type \\\n",
"2017-01-01 00:00:00 Zone Mean Radiant Temperature \n",
"2017-01-01 01:00:00 Zone Mean Radiant Temperature \n",
"2017-01-01 02:00:00 Zone Mean Radiant Temperature \n",
"2017-01-01 03:00:00 Zone Mean Radiant Temperature \n",
"2017-01-01 04:00:00 Zone Mean Radiant Temperature \n",
"\n",
" Zone \\\n",
"2017-01-01 00:00:00 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"2017-01-01 01:00:00 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"2017-01-01 02:00:00 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"2017-01-01 03:00:00 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"2017-01-01 04:00:00 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"\n",
" datetime \n",
"2017-01-01 00:00:00 2017-01-01 00:00:00 \n",
"2017-01-01 01:00:00 2017-01-01 01:00:00 \n",
"2017-01-01 02:00:00 2017-01-01 02:00:00 \n",
"2017-01-01 03:00:00 2017-01-01 03:00:00 \n",
"2017-01-01 04:00:00 2017-01-01 04:00:00 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = lbp.DataFrame(\n",
" data_collections=data_collections,\n",
" name_key='type',\n",
" axis=0,\n",
" populate_metadata=True,\n",
")\n",
"\n",
"# Add a datetime column\n",
"df['datetime'] = df.index\n",
"\n",
"# Turn all \"ladybug pandas columns\" into numeric columns for Group By operation\n",
"for output in outputs:\n",
" df[output] = pd.to_numeric(df[output])\n",
" \n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>datetime</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2017-01-01 00:00:00</td>\n",
" <td>11.233115</td>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2017-01-01 01:00:00</td>\n",
" <td>24.763651</td>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2017-01-01 02:00:00</td>\n",
" <td>13.552033</td>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2017-01-01 03:00:00</td>\n",
" <td>20.303841</td>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2017-01-01 04:00:00</td>\n",
" <td>12.699779</td>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" datetime Zone Mean Radiant Temperature group \\\n",
"0 2017-01-01 00:00:00 11.233115 Zone \n",
"1 2017-01-01 01:00:00 24.763651 Zone \n",
"2 2017-01-01 02:00:00 13.552033 Zone \n",
"3 2017-01-01 03:00:00 20.303841 Zone \n",
"4 2017-01-01 04:00:00 12.699779 Zone \n",
"\n",
" element_name \n",
"0 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"1 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"2 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"3 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB \n",
"4 ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# \"Unpivot\" (or melt) the dataframe to turn the \"Zone\" and \"System\" columns into rows\n",
"id_vars = ['datetime']\n",
"for output in outputs:\n",
" id_vars.append(output)\n",
" \n",
"value_vars = []\n",
"if 'System' in df.columns:\n",
" value_vars.append('System')\n",
"if 'Zone' in df.columns:\n",
" value_vars.append('Zone')\n",
"\n",
"melted = df.melt(\n",
" id_vars=id_vars,\n",
" value_vars=value_vars,\n",
" var_name='group',\n",
" value_name='element_name'\n",
")\n",
"\n",
"melted.head()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>datetime</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 00:00:00</td>\n",
" <td>11.233115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 01:00:00</td>\n",
" <td>24.763651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 02:00:00</td>\n",
" <td>13.552033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 03:00:00</td>\n",
" <td>20.303841</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 04:00:00</td>\n",
" <td>12.699779</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name datetime \\\n",
"0 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 00:00:00 \n",
"1 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 01:00:00 \n",
"2 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 02:00:00 \n",
"3 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 03:00:00 \n",
"4 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 04:00:00 \n",
"\n",
" Zone Mean Radiant Temperature \n",
"0 11.233115 \n",
"1 24.763651 \n",
"2 13.552033 \n",
"3 20.303841 \n",
"4 12.699779 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remove null rows by grouping by group, element_name, and datetime\n",
"flattened_df = melted.groupby(by=['group', 'element_name', 'datetime']).mean().reset_index()\n",
"\n",
"flattened_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>datetime</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 00:00:00</td>\n",
" <td>11.233115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 01:00:00</td>\n",
" <td>24.763651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 02:00:00</td>\n",
" <td>13.552033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 03:00:00</td>\n",
" <td>20.303841</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Zone</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>2017-01-01 04:00:00</td>\n",
" <td>12.699779</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name datetime \\\n",
"0 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 00:00:00 \n",
"1 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 01:00:00 \n",
"2 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 02:00:00 \n",
"3 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 03:00:00 \n",
"4 Zone ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB 2017-01-01 04:00:00 \n",
"\n",
" Zone Mean Radiant Temperature \n",
"0 11.233115 \n",
"1 24.763651 \n",
"2 13.552033 \n",
"3 20.303841 \n",
"4 12.699779 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Split the flattened dataframe into 2 dataframes (split by \"group\" so \"System\" and \"Zone\")\n",
"group_dfs = [x.reset_index(drop=True).dropna(axis='columns', how='all') for _, x in flattened_df.groupby(by='group')]\n",
"\n",
"for group_df in group_dfs:\n",
" display(group_df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Native Pandas\n",
"\n",
"For fun and because performance allows it we will parse all the available outputs from the SQLite file that report \"Hourly\"."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Available Outputs: ('Surface Outside Face Temperature', 'DistrictCooling:Facility', 'Zone Air Temperature', 'Surface Inside Face Temperature', 'Electricity:Facility', 'Zone Mean Radiant Temperature', 'Zone Air Relative Humidity', 'DistrictHeating:Facility', 'Site Outdoor Air Drybulb Temperature')\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>hour</th>\n",
" <th>minute</th>\n",
" <th>second</th>\n",
" <th>day_of_week</th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>data_type</th>\n",
" <th>frequency</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8</td>\n",
" <td>Surface Inside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>13.881118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8</td>\n",
" <td>Surface Outside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>14.994534</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9</td>\n",
" <td>Surface Inside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>13.413171</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9</td>\n",
" <td>Surface Outside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>12.627358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>APERTURE_323F7070-F593-478D-85E7-61879AE7F789</td>\n",
" <td>Surface Inside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>11.204045</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year month day hour minute second day_of_week group \\\n",
"0 2006 1 1 2 0 0 Sunday Zone \n",
"1 2006 1 1 2 0 0 Sunday Zone \n",
"2 2006 1 1 2 0 0 Sunday Zone \n",
"3 2006 1 1 2 0 0 Sunday Zone \n",
"4 2006 1 1 2 0 0 Sunday Zone \n",
"\n",
" element_name \\\n",
"0 FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8 \n",
"1 FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8 \n",
"2 APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9 \n",
"3 APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9 \n",
"4 APERTURE_323F7070-F593-478D-85E7-61879AE7F789 \n",
"\n",
" data_type frequency value \n",
"0 Surface Inside Face Temperature Hourly 13.881118 \n",
"1 Surface Outside Face Temperature Hourly 14.994534 \n",
"2 Surface Inside Face Temperature Hourly 13.413171 \n",
"3 Surface Outside Face Temperature Hourly 12.627358 \n",
"4 Surface Inside Face Temperature Hourly 11.204045 "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sqlite3\n",
"conn = sqlite3.connect(\"eplusout.sql\")\n",
"\n",
"reporting_frequency='Hourly'\n",
"\n",
"# For fun we will load all the outputs available\n",
"# outputs=['Surface Inside Face Temperature']\n",
"outputs = res.available_outputs\n",
"\n",
"print(f\"Available Outputs: {res.available_outputs}\")\n",
"\n",
"# Query SQL database and return joined data from Time, ReportDataDictionary and ReportData in one table\n",
"sql_query = f\"\"\"\n",
"SELECT \n",
" Time.Year as year,\n",
" Time.Month as month,\n",
" Time.Day as day,\n",
" Time.Hour as hour,\n",
" Time.Minute as minute,\n",
" Time.Dst as second,\n",
" Time.DayType as day_of_week,\n",
" ReportDataDictionary.IndexGroup as 'group',\n",
" ReportDataDictionary.KeyValue as 'element_name',\n",
" ReportDataDictionary.Name as 'data_type',\n",
" ReportDataDictionary.ReportingFrequency as 'frequency',\n",
" ReportData.Value as 'value'\n",
"FROM ReportData\n",
"INNER JOIN ReportDataDictionary on ReportDataDictionary.ReportDataDictionaryIndex = ReportData.ReportDataIndex\n",
"INNER JOIN Time on ReportData.TimeIndex = Time.TimeIndex\n",
"WHERE \n",
" ReportDataDictionary.ReportingFrequency = '{reporting_frequency}'\n",
" AND\n",
" ReportDataDictionary.Name in {outputs}\n",
";\n",
"\"\"\"\n",
" \n",
"\n",
"df = pd.read_sql_query(sql_query, conn)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>data_type</th>\n",
" <th>Surface Inside Face Temperature</th>\n",
" <th>Surface Outside Face Temperature</th>\n",
" <th>Zone Air Relative Humidity</th>\n",
" <th>Zone Air Temperature</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>13.881118</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" <td>14.994534</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>13.413171</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>12.627358</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>11.204045</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"data_type Surface Inside Face Temperature Surface Outside Face Temperature \\\n",
"0 13.881118 NaN \n",
"1 NaN 14.994534 \n",
"2 13.413171 NaN \n",
"3 NaN 12.627358 \n",
"4 11.204045 NaN \n",
"\n",
"data_type Zone Air Relative Humidity Zone Air Temperature \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
"data_type Zone Mean Radiant Temperature \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pivot the table (turn the \"data_type\" column values into column names)\n",
"pivoted = df.pivot(\n",
" columns='data_type',\n",
" values='value'\n",
")\n",
"\n",
"pivoted.head()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>hour</th>\n",
" <th>minute</th>\n",
" <th>second</th>\n",
" <th>day_of_week</th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>data_type</th>\n",
" <th>frequency</th>\n",
" <th>Surface Inside Face Temperature</th>\n",
" <th>Surface Outside Face Temperature</th>\n",
" <th>Zone Air Relative Humidity</th>\n",
" <th>Zone Air Temperature</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8</td>\n",
" <td>Surface Inside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>13.881118</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8</td>\n",
" <td>Surface Outside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>NaN</td>\n",
" <td>14.994534</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9</td>\n",
" <td>Surface Inside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>13.413171</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9</td>\n",
" <td>Surface Outside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>NaN</td>\n",
" <td>12.627358</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>Zone</td>\n",
" <td>APERTURE_323F7070-F593-478D-85E7-61879AE7F789</td>\n",
" <td>Surface Inside Face Temperature</td>\n",
" <td>Hourly</td>\n",
" <td>11.204045</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year month day hour minute second day_of_week group \\\n",
"0 2006 1 1 2 0 0 Sunday Zone \n",
"1 2006 1 1 2 0 0 Sunday Zone \n",
"2 2006 1 1 2 0 0 Sunday Zone \n",
"3 2006 1 1 2 0 0 Sunday Zone \n",
"4 2006 1 1 2 0 0 Sunday Zone \n",
"\n",
" element_name \\\n",
"0 FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8 \n",
"1 FACE_2735DA50-F8DB-4F9E-A9C0-69F8D56A49C8 \n",
"2 APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9 \n",
"3 APERTURE_209F8034-C56E-4D96-B43F-E720FF9A0AC9 \n",
"4 APERTURE_323F7070-F593-478D-85E7-61879AE7F789 \n",
"\n",
" data_type frequency \\\n",
"0 Surface Inside Face Temperature Hourly \n",
"1 Surface Outside Face Temperature Hourly \n",
"2 Surface Inside Face Temperature Hourly \n",
"3 Surface Outside Face Temperature Hourly \n",
"4 Surface Inside Face Temperature Hourly \n",
"\n",
" Surface Inside Face Temperature Surface Outside Face Temperature \\\n",
"0 13.881118 NaN \n",
"1 NaN 14.994534 \n",
"2 13.413171 NaN \n",
"3 NaN 12.627358 \n",
"4 11.204045 NaN \n",
"\n",
" Zone Air Relative Humidity Zone Air Temperature \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
" Zone Mean Radiant Temperature \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN "
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Merge the pivoted DF back with the original one to get grouping information (group, element name, frequency, datetime)\n",
"merged = pd.merge(df, pivoted, left_index=True, right_index=True).drop('value', axis=1)\n",
"\n",
"merged.head()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>frequency</th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>hour</th>\n",
" <th>minute</th>\n",
" <th>second</th>\n",
" <th>day_of_week</th>\n",
" <th>Surface Inside Face Temperature</th>\n",
" <th>Surface Outside Face Temperature</th>\n",
" <th>Zone Air Relative Humidity</th>\n",
" <th>Zone Air Temperature</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>System</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>11.861472</td>\n",
" <td>13.678985</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>System</td>\n",
" <td>ROOM_03A1C368-7B54-45E8-9A4A-4671255B6D11</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1.793131</td>\n",
" <td>7.828025</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>System</td>\n",
" <td>ROOM_03EDD150-A74F-4314-B302-0D28E89163D1</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>14.982818</td>\n",
" <td>14.733151</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>System</td>\n",
" <td>ROOM_04283503-ABFF-455F-8521-B54BBA54DE7D</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>14.919206</td>\n",
" <td>14.596326</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>System</td>\n",
" <td>ROOM_04733899-6755-40BB-AB62-0B640A1BE24B</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>15.396306</td>\n",
" <td>15.219029</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name frequency year month \\\n",
"0 System ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB Hourly 2006 1 \n",
"1 System ROOM_03A1C368-7B54-45E8-9A4A-4671255B6D11 Hourly 2006 1 \n",
"2 System ROOM_03EDD150-A74F-4314-B302-0D28E89163D1 Hourly 2006 1 \n",
"3 System ROOM_04283503-ABFF-455F-8521-B54BBA54DE7D Hourly 2006 1 \n",
"4 System ROOM_04733899-6755-40BB-AB62-0B640A1BE24B Hourly 2006 1 \n",
"\n",
" day hour minute second day_of_week Surface Inside Face Temperature \\\n",
"0 1 3 0 0 Sunday NaN \n",
"1 1 3 0 0 Sunday NaN \n",
"2 1 3 0 0 Sunday NaN \n",
"3 1 3 0 0 Sunday NaN \n",
"4 1 3 0 0 Sunday NaN \n",
"\n",
" Surface Outside Face Temperature Zone Air Relative Humidity \\\n",
"0 NaN 11.861472 \n",
"1 NaN 1.793131 \n",
"2 NaN 14.982818 \n",
"3 NaN 14.919206 \n",
"4 NaN 15.396306 \n",
"\n",
" Zone Air Temperature Zone Mean Radiant Temperature \n",
"0 13.678985 NaN \n",
"1 7.828025 NaN \n",
"2 14.733151 NaN \n",
"3 14.596326 NaN \n",
"4 15.219029 NaN "
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# remove null values in columns by grouping by the columns below\n",
"flattened_df = merged.groupby(by=['group', 'element_name', 'frequency', 'year', 'month', 'day', 'hour', 'minute', 'second', 'day_of_week']).mean().reset_index()\n",
"\n",
"flattened_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>frequency</th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>hour</th>\n",
" <th>minute</th>\n",
" <th>second</th>\n",
" <th>day_of_week</th>\n",
" <th>Zone Air Relative Humidity</th>\n",
" <th>Zone Air Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>System</td>\n",
" <td>ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>11.861472</td>\n",
" <td>13.678985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>System</td>\n",
" <td>ROOM_03A1C368-7B54-45E8-9A4A-4671255B6D11</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>1.793131</td>\n",
" <td>7.828025</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>System</td>\n",
" <td>ROOM_03EDD150-A74F-4314-B302-0D28E89163D1</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>14.982818</td>\n",
" <td>14.733151</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>System</td>\n",
" <td>ROOM_04283503-ABFF-455F-8521-B54BBA54DE7D</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>14.919206</td>\n",
" <td>14.596326</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>System</td>\n",
" <td>ROOM_04733899-6755-40BB-AB62-0B640A1BE24B</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>15.396306</td>\n",
" <td>15.219029</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name frequency year month \\\n",
"0 System ROOM_03859664-D3A1-4BA2-AFE9-5A81A0381ABB Hourly 2006 1 \n",
"1 System ROOM_03A1C368-7B54-45E8-9A4A-4671255B6D11 Hourly 2006 1 \n",
"2 System ROOM_03EDD150-A74F-4314-B302-0D28E89163D1 Hourly 2006 1 \n",
"3 System ROOM_04283503-ABFF-455F-8521-B54BBA54DE7D Hourly 2006 1 \n",
"4 System ROOM_04733899-6755-40BB-AB62-0B640A1BE24B Hourly 2006 1 \n",
"\n",
" day hour minute second day_of_week Zone Air Relative Humidity \\\n",
"0 1 3 0 0 Sunday 11.861472 \n",
"1 1 3 0 0 Sunday 1.793131 \n",
"2 1 3 0 0 Sunday 14.982818 \n",
"3 1 3 0 0 Sunday 14.919206 \n",
"4 1 3 0 0 Sunday 15.396306 \n",
"\n",
" Zone Air Temperature \n",
"0 13.678985 \n",
"1 7.828025 \n",
"2 14.733151 \n",
"3 14.596326 \n",
"4 15.219029 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>element_name</th>\n",
" <th>frequency</th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>hour</th>\n",
" <th>minute</th>\n",
" <th>second</th>\n",
" <th>day_of_week</th>\n",
" <th>Surface Inside Face Temperature</th>\n",
" <th>Surface Outside Face Temperature</th>\n",
" <th>Zone Mean Radiant Temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_00092C84-5E5A-4CF7-95D6-B85504CACDDE</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>-2.407988</td>\n",
" <td>8.710937</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_002FE07C-D6CB-4225-8A57-E542AD8C41EB</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>7.598665</td>\n",
" <td>1.830790</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_003460E3-71EC-4ACA-8CBA-F63BDDC69C41</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>1.374292</td>\n",
" <td>5.687736</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_01220E99-168C-49B0-9541-3492DBB6821B</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>12.515119</td>\n",
" <td>15.140696</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Zone</td>\n",
" <td>APERTURE_0253129F-D1C3-4C10-B5BD-543C37AD9BD2</td>\n",
" <td>Hourly</td>\n",
" <td>2006</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Sunday</td>\n",
" <td>10.102524</td>\n",
" <td>5.704591</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group element_name frequency year month \\\n",
"0 Zone APERTURE_00092C84-5E5A-4CF7-95D6-B85504CACDDE Hourly 2006 1 \n",
"1 Zone APERTURE_002FE07C-D6CB-4225-8A57-E542AD8C41EB Hourly 2006 1 \n",
"2 Zone APERTURE_003460E3-71EC-4ACA-8CBA-F63BDDC69C41 Hourly 2006 1 \n",
"3 Zone APERTURE_01220E99-168C-49B0-9541-3492DBB6821B Hourly 2006 1 \n",
"4 Zone APERTURE_0253129F-D1C3-4C10-B5BD-543C37AD9BD2 Hourly 2006 1 \n",
"\n",
" day hour minute second day_of_week Surface Inside Face Temperature \\\n",
"0 1 3 0 0 Sunday -2.407988 \n",
"1 1 2 0 0 Sunday 7.598665 \n",
"2 1 3 0 0 Sunday 1.374292 \n",
"3 1 3 0 0 Sunday 12.515119 \n",
"4 1 3 0 0 Sunday 10.102524 \n",
"\n",
" Surface Outside Face Temperature Zone Mean Radiant Temperature \n",
"0 8.710937 NaN \n",
"1 1.830790 NaN \n",
"2 5.687736 NaN \n",
"3 15.140696 NaN \n",
"4 5.704591 NaN "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Split the dataframe into multiple smalle dataframes by 'group' (ie: ReportDataDictionary.IndexGroup in the SQLite db)\n",
"# We also perform some quick cleanup operation to remove columns that have only null values\n",
"group_dfs = [x.reset_index(drop=True).dropna(axis='columns', how='all') for _, x in flattened_df.groupby(by='group')]\n",
"\n",
"for group_df in group_dfs:\n",
" display(group_df.head())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "3.6.6",
"language": "python",
"name": "3.6.6"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@chriswmackey
Copy link

chriswmackey commented Jul 13, 2020

I just wanted to make a note of the final table structure that we settled on:

month day hour minute object_identifier surface_inside_face_temperature surface_outside_face_temperature ...

@chriswmackey
Copy link

Also the transformer command will have the following required inputs:

  • output_names
  • run_period_name

And the following options:

  • normalize
  • units_system
  • divide_by_multipliers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment