vaclavdekanovsky · October 3, 2020 20:10
diff --git a/Pandas_Upsample_Grouping.ipynb b/Pandas_Upsample_Grouping.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Upsampling with average\n",
    "groups and indices paramters and how to use them to upsample to average"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import io"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "data=\"\"\"\n",
    "CAT|DATE|VALUE\n",
    "abc|0101|10\n",
    "abc|0103|20\n",
    "abc|0106|15\"\"\"\n",
    "df = pd.read_csv(io.StringIO(data), skiprows=1, sep=\"|\", dtype={\"DATE\":\"str\"})\n",
    "df[\"DATE\"] = pd.to_datetime(df[\"DATE\"], format=\"%m%d\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "expected=\"\"\"\n",
    "CAT|DATE|VALUE\n",
    "abc|0102|10\n",
    "abc|0103|10\n",
    "abc|0104|5\n",
    "abc|0105|5\n",
    "abc|0106|5\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The resample method has two attributes:\n",
    "    \n",
    "* `indices` - showing where the input data appeared\n",
    "* `groups` - indexing each datatime value with its group index.\n",
    "\n",
    "https://pandas.pydata.org/pandas-docs/stable/reference/resampling.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defaultdict(list,\n",
       "            {Timestamp('1900-01-01 00:00:00', freq='D'): [0],\n",
       "             Timestamp('1900-01-03 00:00:00', freq='D'): [1],\n",
       "             Timestamp('1900-01-06 00:00:00', freq='D'): [2]})"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the data came on first, third and sixth of Jan\n",
    "df.set_index(\"DATE\")\\\n",
    ".resample(\"D\").indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{Timestamp('1900-01-01 00:00:00', freq='D'): 1,\n",
       " Timestamp('1900-01-02 00:00:00', freq='D'): 1,\n",
       " Timestamp('1900-01-03 00:00:00', freq='D'): 2,\n",
       " Timestamp('1900-01-04 00:00:00', freq='D'): 2,\n",
       " Timestamp('1900-01-05 00:00:00', freq='D'): 2,\n",
       " Timestamp('1900-01-06 00:00:00', freq='D'): 3}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the groups are \n",
    "\"\"\" \n",
    "    Jan-1 Group 1\n",
    "    Jan-2 Group 1\n",
    "    Jan-3 Group 2\n",
    "    Jan-4 Group 2\n",
    "    Jan-5 Group 2\n",
    "    Jan-6 Group 3\n",
    "\"\"\"\n",
    "df.set_index(\"DATE\")\\\n",
    ".resample(\"D\").groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In reality our data from Jan-1 cover unknown period of time, the value from Jan-3 covers the third and second and the last value covers fourth, fifth and sixth of Jan. This is achived by `label` parameter of the resample method.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{Timestamp('1900-01-02 00:00:00', freq='D'): 1,\n",
       " Timestamp('1900-01-03 00:00:00', freq='D'): 1,\n",
       " Timestamp('1900-01-04 00:00:00', freq='D'): 2,\n",
       " Timestamp('1900-01-05 00:00:00', freq='D'): 2,\n",
       " Timestamp('1900-01-06 00:00:00', freq='D'): 2,\n",
       " Timestamp('1900-01-07 00:00:00', freq='D'): 3}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.set_index(\"DATE\").resample(\"D\", label=\"right\").groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, we can turn the dictionary returned by the `.groups` parameter into a dataframe and shift the group ids by 1 (row) so that data match our scenario."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1900-01-01</th>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-02</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-03</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-04</th>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-05</th>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-06</th>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            group\n",
       "1900-01-01    NaN\n",
       "1900-01-02    1.0\n",
       "1900-01-03    1.0\n",
       "1900-01-04    2.0\n",
       "1900-01-05    2.0\n",
       "1900-01-06    2.0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Calculate the number of hours in each group\n",
    "resampled_groups = df.set_index(\"DATE\").resample(\"D\").groups\n",
    "df_groups = pd.DataFrame(resampled_groups, index=[\"group\"]).T.shift(1)\n",
    "df_groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Knowing which dates belong to each group, let us calculate the number of occurences (days) between each measurement. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "group\n",
       "1.0    2\n",
       "2.0    3\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = df_groups.groupby(\"group\").size()\n",
    "s.name = \"count\"\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By joining the counts back to the groups we will see how many days were covered each date so that we can calculate the average - measured value at the end of the period/period lenght.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1900-01-01    NaN\n",
       "1900-01-02    2.0\n",
       "1900-01-03    2.0\n",
       "1900-01-04    3.0\n",
       "1900-01-05    3.0\n",
       "1900-01-06    3.0\n",
       "Name: count, dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_counts = df_groups.join(s, on=\"group\")[\"count\"]\n",
    "s_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mering this back to the original dataframe, with backfilled sum allow us to calculate the average. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CAT</th>\n",
       "      <th>VALUE</th>\n",
       "      <th>count</th>\n",
       "      <th>average</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1900-01-01</th>\n",
       "      <td>abc</td>\n",
       "      <td>10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-02</th>\n",
       "      <td>abc</td>\n",
       "      <td>20</td>\n",
       "      <td>2.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-03</th>\n",
       "      <td>abc</td>\n",
       "      <td>20</td>\n",
       "      <td>2.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-04</th>\n",
       "      <td>abc</td>\n",
       "      <td>15</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-05</th>\n",
       "      <td>abc</td>\n",
       "      <td>15</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900-01-06</th>\n",
       "      <td>abc</td>\n",
       "      <td>15</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            CAT  VALUE  count  average\n",
       "DATE                                  \n",
       "1900-01-01  abc     10    NaN      NaN\n",
       "1900-01-02  abc     20    2.0     10.0\n",
       "1900-01-03  abc     20    2.0     10.0\n",
       "1900-01-04  abc     15    3.0      5.0\n",
       "1900-01-05  abc     15    3.0      5.0\n",
       "1900-01-06  abc     15    3.0      5.0"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res = df.set_index(\"DATE\").resample(\"D\").bfill()\n",
    "res = res.join(s_counts)\n",
    "res[\"average\"] = res[\"VALUE\"]/res[\"count\"]\n",
    "res"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Upsampling with average\n",
	"groups and indices paramters and how to use them to upsample to average"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import pandas as pd\n",
	"import io"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"data=\"\"\"\n",
	"CAT\|DATE\|VALUE\n",
	"abc\|0101\|10\n",
	"abc\|0103\|20\n",
	"abc\|0106\|15\"\"\"\n",
	"df = pd.read_csv(io.StringIO(data), skiprows=1, sep=\"\|\", dtype={\"DATE\":\"str\"})\n",
	"df[\"DATE\"] = pd.to_datetime(df[\"DATE\"], format=\"%m%d\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"expected=\"\"\"\n",
	"CAT\|DATE\|VALUE\n",
	"abc\|0102\|10\n",
	"abc\|0103\|10\n",
	"abc\|0104\|5\n",
	"abc\|0105\|5\n",
	"abc\|0106\|5\"\"\""
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The resample method has two attributes:\n",
	" \n",
	"* `indices` - showing where the input data appeared\n",
	"* `groups` - indexing each datatime value with its group index.\n",
	"\n",
	"https://pandas.pydata.org/pandas-docs/stable/reference/resampling.html"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"defaultdict(list,\n",
	" {Timestamp('1900-01-01 00:00:00', freq='D'): [0],\n",
	" Timestamp('1900-01-03 00:00:00', freq='D'): [1],\n",
	" Timestamp('1900-01-06 00:00:00', freq='D'): [2]})"
	]
	},
	"execution_count": 4,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# the data came on first, third and sixth of Jan\n",
	"df.set_index(\"DATE\")\\\n",
	".resample(\"D\").indices"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{Timestamp('1900-01-01 00:00:00', freq='D'): 1,\n",
	" Timestamp('1900-01-02 00:00:00', freq='D'): 1,\n",
	" Timestamp('1900-01-03 00:00:00', freq='D'): 2,\n",
	" Timestamp('1900-01-04 00:00:00', freq='D'): 2,\n",
	" Timestamp('1900-01-05 00:00:00', freq='D'): 2,\n",
	" Timestamp('1900-01-06 00:00:00', freq='D'): 3}"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# the groups are \n",
	"\"\"\" \n",
	" Jan-1 Group 1\n",
	" Jan-2 Group 1\n",
	" Jan-3 Group 2\n",
	" Jan-4 Group 2\n",
	" Jan-5 Group 2\n",
	" Jan-6 Group 3\n",
	"\"\"\"\n",
	"df.set_index(\"DATE\")\\\n",
	".resample(\"D\").groups"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In reality our data from Jan-1 cover unknown period of time, the value from Jan-3 covers the third and second and the last value covers fourth, fifth and sixth of Jan. This is achived by `label` parameter of the resample method.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{Timestamp('1900-01-02 00:00:00', freq='D'): 1,\n",
	" Timestamp('1900-01-03 00:00:00', freq='D'): 1,\n",
	" Timestamp('1900-01-04 00:00:00', freq='D'): 2,\n",
	" Timestamp('1900-01-05 00:00:00', freq='D'): 2,\n",
	" Timestamp('1900-01-06 00:00:00', freq='D'): 2,\n",
	" Timestamp('1900-01-07 00:00:00', freq='D'): 3}"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.set_index(\"DATE\").resample(\"D\", label=\"right\").groups"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Alternatively, we can turn the dictionary returned by the `.groups` parameter into a dataframe and shift the group ids by 1 (row) so that data match our scenario."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>group</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>1900-01-01</th>\n",
	" <td>NaN</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-02</th>\n",
	" <td>1.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-03</th>\n",
	" <td>1.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-04</th>\n",
	" <td>2.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-05</th>\n",
	" <td>2.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-06</th>\n",
	" <td>2.0</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" group\n",
	"1900-01-01 NaN\n",
	"1900-01-02 1.0\n",
	"1900-01-03 1.0\n",
	"1900-01-04 2.0\n",
	"1900-01-05 2.0\n",
	"1900-01-06 2.0"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Calculate the number of hours in each group\n",
	"resampled_groups = df.set_index(\"DATE\").resample(\"D\").groups\n",
	"df_groups = pd.DataFrame(resampled_groups, index=[\"group\"]).T.shift(1)\n",
	"df_groups"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Knowing which dates belong to each group, let us calculate the number of occurences (days) between each measurement. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"group\n",
	"1.0 2\n",
	"2.0 3\n",
	"Name: count, dtype: int64"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"s = df_groups.groupby(\"group\").size()\n",
	"s.name = \"count\"\n",
	"s"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"By joining the counts back to the groups we will see how many days were covered each date so that we can calculate the average - measured value at the end of the period/period lenght. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"1900-01-01 NaN\n",
	"1900-01-02 2.0\n",
	"1900-01-03 2.0\n",
	"1900-01-04 3.0\n",
	"1900-01-05 3.0\n",
	"1900-01-06 3.0\n",
	"Name: count, dtype: float64"
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"s_counts = df_groups.join(s, on=\"group\")[\"count\"]\n",
	"s_counts"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Mering this back to the original dataframe, with backfilled sum allow us to calculate the average. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>CAT</th>\n",
	" <th>VALUE</th>\n",
	" <th>count</th>\n",
	" <th>average</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>DATE</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>1900-01-01</th>\n",
	" <td>abc</td>\n",
	" <td>10</td>\n",
	" <td>NaN</td>\n",
	" <td>NaN</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-02</th>\n",
	" <td>abc</td>\n",
	" <td>20</td>\n",
	" <td>2.0</td>\n",
	" <td>10.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-03</th>\n",
	" <td>abc</td>\n",
	" <td>20</td>\n",
	" <td>2.0</td>\n",
	" <td>10.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-04</th>\n",
	" <td>abc</td>\n",
	" <td>15</td>\n",
	" <td>3.0</td>\n",
	" <td>5.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-05</th>\n",
	" <td>abc</td>\n",
	" <td>15</td>\n",
	" <td>3.0</td>\n",
	" <td>5.0</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1900-01-06</th>\n",
	" <td>abc</td>\n",
	" <td>15</td>\n",
	" <td>3.0</td>\n",
	" <td>5.0</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" CAT VALUE count average\n",
	"DATE \n",
	"1900-01-01 abc 10 NaN NaN\n",
	"1900-01-02 abc 20 2.0 10.0\n",
	"1900-01-03 abc 20 2.0 10.0\n",
	"1900-01-04 abc 15 3.0 5.0\n",
	"1900-01-05 abc 15 3.0 5.0\n",
	"1900-01-06 abc 15 3.0 5.0"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"res = df.set_index(\"DATE\").resample(\"D\").bfill()\n",
	"res = res.join(s_counts)\n",
	"res[\"average\"] = res[\"VALUE\"]/res[\"count\"]\n",
	"res"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}
No results found