Last active
September 27, 2017 18:21
-
-
Save stharrold/5843861c089156c4fd0665cfe8501989 to your computer and use it in GitHub Desktop.
20170927_pandas_timeseries_rolling_large-window_unique-count.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# How to efficiently compute a rolling unique count in a `pandas` time series?\n", | |
"\n", | |
"I have a time series of people visiting a building. Each person has a unique ID. I want to know the number of unique people visiting the building in the last 365 days.\n", | |
"\n", | |
"`pandas` does not seem to have a built-in method for this calculation. The calculation becomes computationally intensive when there are a large number of unique visitors and/or a large window. (The actual data is larger than this example.)\n", | |
"\n", | |
"Is there a better way to calculate than what I've done below? I'm not sure why the method I made `windowed_nunique` (under \"Speed test 3\") is off by 1.\n", | |
"\n", | |
"Related links:\n", | |
"* https://github.com/pandas-dev/pandas/issues/14336" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Initialization" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Import libraries.\n", | |
"import pandas as pd\n", | |
"import numba\n", | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Created data of people visiting a building:\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Date</th>\n", | |
" <th>PersonId</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>2010-01-01</td>\n", | |
" <td>76</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2010-01-01</td>\n", | |
" <td>63</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>2010-01-01</td>\n", | |
" <td>89</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>2010-01-01</td>\n", | |
" <td>81</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>2010-01-01</td>\n", | |
" <td>7</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>2010-01-02</td>\n", | |
" <td>22</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>2010-01-02</td>\n", | |
" <td>83</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>2010-01-02</td>\n", | |
" <td>78</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>2010-01-02</td>\n", | |
" <td>47</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>2010-01-02</td>\n", | |
" <td>68</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>2010-01-02</td>\n", | |
" <td>72</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>2010-01-03</td>\n", | |
" <td>89</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>2010-01-03</td>\n", | |
" <td>94</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>2010-01-03</td>\n", | |
" <td>44</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>2010-01-04</td>\n", | |
" <td>67</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>2010-01-04</td>\n", | |
" <td>88</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>2010-01-04</td>\n", | |
" <td>90</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>2010-01-05</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>2010-01-05</td>\n", | |
" <td>90</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>2010-01-05</td>\n", | |
" <td>70</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>2010-01-06</td>\n", | |
" <td>10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>2010-01-06</td>\n", | |
" <td>77</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>2010-01-07</td>\n", | |
" <td>15</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>2010-01-08</td>\n", | |
" <td>78</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>2010-01-08</td>\n", | |
" <td>81</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>2010-01-08</td>\n", | |
" <td>49</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>2010-01-08</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>2010-01-08</td>\n", | |
" <td>92</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>2010-01-09</td>\n", | |
" <td>35</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>2010-01-09</td>\n", | |
" <td>69</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9151</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>89</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9152</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>54</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9153</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9154</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>76</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9155</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>95</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9156</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>32</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9157</th>\n", | |
" <td>2014-12-27</td>\n", | |
" <td>90</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9158</th>\n", | |
" <td>2014-12-27</td>\n", | |
" <td>73</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9159</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>90</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9160</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9161</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>88</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9162</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>49</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9163</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>93</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9164</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>51</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9165</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>63</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9166</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>27</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9167</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>92</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9168</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>53</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9169</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>66</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9170</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>92</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9171</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>94</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9172</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>75</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9173</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>27</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9174</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>99</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9175</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>83</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9176</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>42</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9177</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>44</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9178</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>93</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9179</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9180</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>80</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>9181 rows × 2 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Date PersonId\n", | |
"0 2010-01-01 76\n", | |
"1 2010-01-01 63\n", | |
"2 2010-01-01 89\n", | |
"3 2010-01-01 81\n", | |
"4 2010-01-01 7\n", | |
"5 2010-01-02 22\n", | |
"6 2010-01-02 83\n", | |
"7 2010-01-02 78\n", | |
"8 2010-01-02 47\n", | |
"9 2010-01-02 68\n", | |
"10 2010-01-02 72\n", | |
"11 2010-01-03 89\n", | |
"12 2010-01-03 94\n", | |
"13 2010-01-03 44\n", | |
"14 2010-01-04 67\n", | |
"15 2010-01-04 88\n", | |
"16 2010-01-04 90\n", | |
"17 2010-01-05 30\n", | |
"18 2010-01-05 90\n", | |
"19 2010-01-05 70\n", | |
"20 2010-01-06 10\n", | |
"21 2010-01-06 77\n", | |
"22 2010-01-07 15\n", | |
"23 2010-01-08 78\n", | |
"24 2010-01-08 81\n", | |
"25 2010-01-08 49\n", | |
"26 2010-01-08 96\n", | |
"27 2010-01-08 92\n", | |
"28 2010-01-09 35\n", | |
"29 2010-01-09 69\n", | |
"... ... ...\n", | |
"9151 2014-12-26 89\n", | |
"9152 2014-12-26 54\n", | |
"9153 2014-12-26 56\n", | |
"9154 2014-12-26 76\n", | |
"9155 2014-12-26 95\n", | |
"9156 2014-12-26 32\n", | |
"9157 2014-12-27 90\n", | |
"9158 2014-12-27 73\n", | |
"9159 2014-12-28 90\n", | |
"9160 2014-12-28 55\n", | |
"9161 2014-12-28 88\n", | |
"9162 2014-12-28 49\n", | |
"9163 2014-12-28 93\n", | |
"9164 2014-12-29 51\n", | |
"9165 2014-12-29 63\n", | |
"9166 2014-12-29 27\n", | |
"9167 2014-12-29 92\n", | |
"9168 2014-12-29 53\n", | |
"9169 2014-12-30 66\n", | |
"9170 2014-12-30 92\n", | |
"9171 2014-12-30 94\n", | |
"9172 2014-12-30 75\n", | |
"9173 2014-12-30 27\n", | |
"9174 2014-12-30 99\n", | |
"9175 2014-12-31 83\n", | |
"9176 2014-12-31 42\n", | |
"9177 2014-12-31 44\n", | |
"9178 2015-01-01 93\n", | |
"9179 2015-01-01 30\n", | |
"9180 2015-01-01 80\n", | |
"\n", | |
"[9181 rows x 2 columns]" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Create data of people visiting a building.\n", | |
"\n", | |
"np.random.seed(seed=0)\n", | |
"dates = pd.date_range(start='2010-01-01', end='2015-01-01', freq='D')\n", | |
"window = 365 # days\n", | |
"num_pids = 100\n", | |
"probs = np.linspace(start=0.001, stop=0.1, num=num_pids)\n", | |
"\n", | |
"df = pd.\\\n", | |
" DataFrame(\n", | |
" data=[(date, pid)\n", | |
" for (pid, prob) in zip(range(num_pids), probs)\n", | |
" for date in np.compress(np.random.binomial(n=1, p=prob, size=len(dates)), dates)],\n", | |
" columns=['Date', 'PersonId'])\\\n", | |
" .sort_values(by='Date')\\\n", | |
" .reset_index(drop=True)\n", | |
"\n", | |
"print(\"Created data of people visiting a building:\")\n", | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Speed reference" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"3.41 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%%timeit\n", | |
"# This counts the number of people visiting the building, not the number of unique people.\n", | |
"# Provided as a speed reference.\n", | |
"df.rolling(window='{:d}D'.format(window), on='Date').count()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Speed test 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"2.25 s ± 160 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%%timeit\n", | |
"df.rolling(window='{:d}D'.format(window), on='Date').apply(lambda arr: pd.Series(arr).nunique())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Save results as a reference to check calculation accuracy.\n", | |
"ref = df.rolling(window='{:d}D'.format(window), on='Date').apply(lambda arr: pd.Series(arr).nunique())['PersonId'].values" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Speed test 2" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Define a custom function and implement a just-in-time compiler.\n", | |
"@numba.jit(nopython=True)\n", | |
"def nunique(arr):\n", | |
" return len(set(arr))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"444 ms ± 54.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%%timeit\n", | |
"df.rolling(window='{:d}D'.format(window), on='Date').apply(nunique)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Check accuracy of results.\n", | |
"test = df.rolling(window='{:d}D'.format(window), on='Date').apply(nunique)['PersonId'].values\n", | |
"assert all((ref == test))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Speed test 3" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Define a custom function and implement a just-in-time compiler.\n", | |
"@numba.jit(nopython=True)\n", | |
"def windowed_nunique(dates, pids, window):\n", | |
" r\"\"\"Track number of unique persons in window,\n", | |
" reading through arrays only once.\n", | |
" \n", | |
" Args:\n", | |
" dates (numpy.ndarray): Array of dates as number of days since epoch.\n", | |
" pids (numpy.ndarray): Array of integer person identifiers.\n", | |
" \n", | |
" window (int): Width of window in units of difference of `dates`.\n", | |
" \n", | |
" Returns:\n", | |
" ucts (numpy.ndarray): Array of unique counts\n", | |
" \n", | |
" Raises:\n", | |
" AssertionError: Raised if `len(dates) != len(pids)`\n", | |
" \n", | |
" Notes:\n", | |
" * May be off by 1 compared to `pandas.core.window.Rolling`\n", | |
" with a time series alias offset.\n", | |
" \n", | |
" \"\"\"\n", | |
"\n", | |
" # Check arguments.\n", | |
" assert dates.shape == pids.shape\n", | |
" \n", | |
" # Initialize counters.\n", | |
" idx_min = 0\n", | |
" idx_max = dates.shape[0]\n", | |
" date_min = dates[idx_min]\n", | |
" pid_min = pids[idx_min]\n", | |
" pid_max = np.max(pids)\n", | |
" pid_cts = np.zeros(pid_max, dtype=np.int64)\n", | |
" pid_cts[pid_min] = 1\n", | |
" uct = 1\n", | |
" ucts = np.zeros(idx_max, dtype=np.int64)\n", | |
" ucts[idx_min] = uct\n", | |
" idx = 1\n", | |
" \n", | |
" # For each (date, person)...\n", | |
" while idx < idx_max:\n", | |
" \n", | |
" # If person count went from 0 to 1, increment unique person count.\n", | |
" date = dates[idx]\n", | |
" pid = pids[idx]\n", | |
" pid_cts[pid] += 1\n", | |
" if pid_cts[pid] == 1:\n", | |
" uct += 1\n", | |
" \n", | |
" # For past dates outside of window...\n", | |
" while (date - date_min) > window:\n", | |
" \n", | |
" # If person count went from 1 to 0, decrement unique person count.\n", | |
" pid_cts[pid_min] -= 1\n", | |
" if pid_cts[pid_min] == 0:\n", | |
" uct -= 1\n", | |
" idx_min += 1\n", | |
" date_min = dates[idx_min]\n", | |
" pid_min = pids[idx_min]\n", | |
" \n", | |
" # Record unique person count.\n", | |
" ucts[idx] = uct\n", | |
" idx += 1\n", | |
" \n", | |
" return ucts" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Format dates and person IDs.\n", | |
"df['DateEpoch'] = (df['Date'] - pd.to_datetime('1970-01-01'))/pd.to_timedelta(1, unit='D')\n", | |
"df['DateEpoch'] = df['DateEpoch'].astype(int)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"232 µs ± 110 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%%timeit\n", | |
"windowed_nunique(\n", | |
" dates=df['DateEpoch'].astype(int).values,\n", | |
" pids=df['PersonId'].astype(int).values,\n", | |
" window=window)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Check accuracy of results.\n", | |
"test = windowed_nunique(\n", | |
" dates=df['DateEpoch'].values,\n", | |
" pids=df['PersonId'].values,\n", | |
" window=window)\n", | |
"# Note: Method may be off by 1.\n", | |
"assert all(np.isclose(ref, np.asarray(test), atol=1))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Where reference ('ref') calculation of number of unique people doesn't match 'test':\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Date</th>\n", | |
" <th>PersonId</th>\n", | |
" <th>DateEpoch</th>\n", | |
" <th>ref</th>\n", | |
" <th>test</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>78</th>\n", | |
" <td>2010-01-19</td>\n", | |
" <td>99</td>\n", | |
" <td>14628</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>79</th>\n", | |
" <td>2010-01-19</td>\n", | |
" <td>96</td>\n", | |
" <td>14628</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>80</th>\n", | |
" <td>2010-01-19</td>\n", | |
" <td>88</td>\n", | |
" <td>14628</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>81</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>94</td>\n", | |
" <td>14629</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>82</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>48</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>83</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>74</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>84</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>95</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>85</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>70</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>86</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>71</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>87</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>62</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>88</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>77</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>89</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>65</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>90</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>63</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>91</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>74</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>92</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>54</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>93</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>86</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>94</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>32</td>\n", | |
" <td>14631</td>\n", | |
" <td>58.0</td>\n", | |
" <td>57</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>95</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>85</td>\n", | |
" <td>14631</td>\n", | |
" <td>58.0</td>\n", | |
" <td>57</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>96</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>80</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>97</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>72</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>98</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>97</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>99</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>57</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>100</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>50</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>101</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>96</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>102</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>57</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>103</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>30</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>104</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>92</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>105</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>61</td>\n", | |
" <td>14632</td>\n", | |
" <td>61.0</td>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>52</td>\n", | |
" <td>14632</td>\n", | |
" <td>61.0</td>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>107</th>\n", | |
" <td>2010-01-24</td>\n", | |
" <td>67</td>\n", | |
" <td>14633</td>\n", | |
" <td>61.0</td>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9151</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>89</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9152</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>54</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9153</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>56</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9154</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>76</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9155</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>95</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9156</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>32</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9157</th>\n", | |
" <td>2014-12-27</td>\n", | |
" <td>90</td>\n", | |
" <td>16431</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9158</th>\n", | |
" <td>2014-12-27</td>\n", | |
" <td>73</td>\n", | |
" <td>16431</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9159</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>90</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9160</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>55</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9161</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>88</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9162</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>49</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9163</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>93</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9164</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>51</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9165</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>63</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9166</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>27</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9167</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>92</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9168</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>53</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9169</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>66</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9170</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>92</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9171</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>94</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9172</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>75</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9173</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>27</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9174</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>99</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9175</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>83</td>\n", | |
" <td>16435</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9176</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>42</td>\n", | |
" <td>16435</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9177</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>44</td>\n", | |
" <td>16435</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9178</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>93</td>\n", | |
" <td>16436</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9179</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>30</td>\n", | |
" <td>16436</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9180</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>80</td>\n", | |
" <td>16436</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>9044 rows × 5 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Date PersonId DateEpoch ref test\n", | |
"78 2010-01-19 99 14628 56.0 55\n", | |
"79 2010-01-19 96 14628 56.0 55\n", | |
"80 2010-01-19 88 14628 56.0 55\n", | |
"81 2010-01-20 94 14629 56.0 55\n", | |
"82 2010-01-20 48 14629 57.0 56\n", | |
"83 2010-01-20 74 14629 57.0 56\n", | |
"84 2010-01-20 95 14629 57.0 56\n", | |
"85 2010-01-20 70 14629 57.0 56\n", | |
"86 2010-01-21 71 14630 57.0 56\n", | |
"87 2010-01-21 62 14630 57.0 56\n", | |
"88 2010-01-21 77 14630 57.0 56\n", | |
"89 2010-01-21 65 14630 57.0 56\n", | |
"90 2010-01-21 63 14630 57.0 56\n", | |
"91 2010-01-21 74 14630 57.0 56\n", | |
"92 2010-01-21 54 14630 57.0 56\n", | |
"93 2010-01-21 86 14630 57.0 56\n", | |
"94 2010-01-22 32 14631 58.0 57\n", | |
"95 2010-01-22 85 14631 58.0 57\n", | |
"96 2010-01-22 80 14631 59.0 58\n", | |
"97 2010-01-22 72 14631 59.0 58\n", | |
"98 2010-01-22 97 14631 59.0 58\n", | |
"99 2010-01-22 57 14631 59.0 58\n", | |
"100 2010-01-23 50 14632 60.0 59\n", | |
"101 2010-01-23 96 14632 60.0 59\n", | |
"102 2010-01-23 57 14632 60.0 59\n", | |
"103 2010-01-23 30 14632 60.0 59\n", | |
"104 2010-01-23 92 14632 60.0 59\n", | |
"105 2010-01-23 61 14632 61.0 60\n", | |
"106 2010-01-23 52 14632 61.0 60\n", | |
"107 2010-01-24 67 14633 61.0 60\n", | |
"... ... ... ... ... ...\n", | |
"9151 2014-12-26 89 16430 97.0 96\n", | |
"9152 2014-12-26 54 16430 97.0 96\n", | |
"9153 2014-12-26 56 16430 97.0 96\n", | |
"9154 2014-12-26 76 16430 97.0 96\n", | |
"9155 2014-12-26 95 16430 97.0 96\n", | |
"9156 2014-12-26 32 16430 97.0 96\n", | |
"9157 2014-12-27 90 16431 97.0 96\n", | |
"9158 2014-12-27 73 16431 97.0 96\n", | |
"9159 2014-12-28 90 16432 97.0 96\n", | |
"9160 2014-12-28 55 16432 97.0 96\n", | |
"9161 2014-12-28 88 16432 97.0 96\n", | |
"9162 2014-12-28 49 16432 97.0 96\n", | |
"9163 2014-12-28 93 16432 97.0 96\n", | |
"9164 2014-12-29 51 16433 97.0 96\n", | |
"9165 2014-12-29 63 16433 97.0 96\n", | |
"9166 2014-12-29 27 16433 97.0 96\n", | |
"9167 2014-12-29 92 16433 97.0 96\n", | |
"9168 2014-12-29 53 16433 97.0 96\n", | |
"9169 2014-12-30 66 16434 97.0 96\n", | |
"9170 2014-12-30 92 16434 97.0 96\n", | |
"9171 2014-12-30 94 16434 97.0 96\n", | |
"9172 2014-12-30 75 16434 97.0 96\n", | |
"9173 2014-12-30 27 16434 97.0 96\n", | |
"9174 2014-12-30 99 16434 97.0 96\n", | |
"9175 2014-12-31 83 16435 97.0 96\n", | |
"9176 2014-12-31 42 16435 97.0 96\n", | |
"9177 2014-12-31 44 16435 97.0 96\n", | |
"9178 2015-01-01 93 16436 97.0 96\n", | |
"9179 2015-01-01 30 16436 97.0 96\n", | |
"9180 2015-01-01 80 16436 97.0 96\n", | |
"\n", | |
"[9044 rows x 5 columns]" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Show where the calculation doesn't match.\n", | |
"print(\"Where reference ('ref') calculation of number of unique people doesn't match 'test':\")\n", | |
"df['ref'] = ref\n", | |
"df['test'] = test\n", | |
"df.loc[df['ref'] != df['test']]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Speed test 4" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Define a custom function and implement a just-in-time compiler.\n", | |
"@numba.jit(nopython=True)\n", | |
"def windowed_nunique2(dates, pids, window):\n", | |
" r\"\"\"Track number of unique persons in window,\n", | |
" reading through arrays only once.\n", | |
" \n", | |
" Args:\n", | |
" dates (numpy.ndarray): Array of dates as number of days since epoch.\n", | |
" pids (numpy.ndarray): Array of integer person identifiers.\n", | |
" \n", | |
" window (int): Width of window in units of difference of `dates`.\n", | |
" \n", | |
" Returns:\n", | |
" ucts (numpy.ndarray): Array of unique counts\n", | |
" \n", | |
" Raises:\n", | |
" AssertionError: Raised if `len(dates) != len(pids)`\n", | |
" \n", | |
" Notes:\n", | |
" * May be off by 1 compared to `pandas.core.window.Rolling`\n", | |
" with a time series alias offset.\n", | |
" * Decrements when `date - date_min >= window`\n", | |
" \n", | |
" \"\"\"\n", | |
"\n", | |
" # Check arguments.\n", | |
" assert dates.shape == pids.shape\n", | |
" \n", | |
" # Initialize counters.\n", | |
" idx_min = 0\n", | |
" idx_max = dates.shape[0]\n", | |
" date_min = dates[idx_min]\n", | |
" pid_min = pids[idx_min]\n", | |
" pid_max = np.max(pids)\n", | |
" pid_cts = np.zeros(pid_max, dtype=np.int64)\n", | |
" pid_cts[pid_min] = 1\n", | |
" uct = 1\n", | |
" ucts = np.zeros(idx_max, dtype=np.int64)\n", | |
" ucts[idx_min] = uct\n", | |
" idx = 1\n", | |
" \n", | |
" # For each (date, person)...\n", | |
" while idx < idx_max:\n", | |
" \n", | |
" # If person count went from 0 to 1, increment unique person count.\n", | |
" date = dates[idx]\n", | |
" pid = pids[idx]\n", | |
" pid_cts[pid] += 1\n", | |
" if pid_cts[pid] == 1:\n", | |
" uct += 1\n", | |
" \n", | |
" # For past dates outside of window...\n", | |
" while (date - date_min) >= window:\n", | |
" \n", | |
" # If person count went from 1 to 0, decrement unique person count.\n", | |
" pid_cts[pid_min] -= 1\n", | |
" if pid_cts[pid_min] == 0:\n", | |
" uct -= 1\n", | |
" idx_min += 1\n", | |
" date_min = dates[idx_min]\n", | |
" pid_min = pids[idx_min]\n", | |
" \n", | |
" # Record unique person count.\n", | |
" ucts[idx] = uct\n", | |
" idx += 1\n", | |
" \n", | |
" return ucts" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Format dates and person IDs.\n", | |
"df['DateEpoch'] = (df['Date'] - pd.to_datetime('1970-01-01'))/pd.to_timedelta(1, unit='D')\n", | |
"df['DateEpoch'] = df['DateEpoch'].astype(int)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"183 µs ± 12.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" | |
] | |
} | |
], | |
"source": [ | |
"%%timeit\n", | |
"windowed_nunique2(\n", | |
" dates=df['DateEpoch'].astype(int).values,\n", | |
" pids=df['PersonId'].astype(int).values,\n", | |
" window=window)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Check accuracy of results.\n", | |
"test = windowed_nunique2(\n", | |
" dates=df['DateEpoch'].values,\n", | |
" pids=df['PersonId'].values,\n", | |
" window=window)\n", | |
"# Note: Method may be off by 1.\n", | |
"assert all(np.isclose(ref, np.asarray(test), atol=1))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Where reference ('ref') calculation of number of unique people doesn't match 'test':\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Date</th>\n", | |
" <th>PersonId</th>\n", | |
" <th>DateEpoch</th>\n", | |
" <th>ref</th>\n", | |
" <th>test</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>78</th>\n", | |
" <td>2010-01-19</td>\n", | |
" <td>99</td>\n", | |
" <td>14628</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>79</th>\n", | |
" <td>2010-01-19</td>\n", | |
" <td>96</td>\n", | |
" <td>14628</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>80</th>\n", | |
" <td>2010-01-19</td>\n", | |
" <td>88</td>\n", | |
" <td>14628</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>81</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>94</td>\n", | |
" <td>14629</td>\n", | |
" <td>56.0</td>\n", | |
" <td>55</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>82</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>48</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>83</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>74</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>84</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>95</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>85</th>\n", | |
" <td>2010-01-20</td>\n", | |
" <td>70</td>\n", | |
" <td>14629</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>86</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>71</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>87</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>62</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>88</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>77</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>89</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>65</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>90</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>63</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>91</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>74</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>92</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>54</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>93</th>\n", | |
" <td>2010-01-21</td>\n", | |
" <td>86</td>\n", | |
" <td>14630</td>\n", | |
" <td>57.0</td>\n", | |
" <td>56</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>94</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>32</td>\n", | |
" <td>14631</td>\n", | |
" <td>58.0</td>\n", | |
" <td>57</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>95</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>85</td>\n", | |
" <td>14631</td>\n", | |
" <td>58.0</td>\n", | |
" <td>57</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>96</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>80</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>97</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>72</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>98</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>97</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>99</th>\n", | |
" <td>2010-01-22</td>\n", | |
" <td>57</td>\n", | |
" <td>14631</td>\n", | |
" <td>59.0</td>\n", | |
" <td>58</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>100</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>50</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>101</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>96</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>102</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>57</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>103</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>30</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>104</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>92</td>\n", | |
" <td>14632</td>\n", | |
" <td>60.0</td>\n", | |
" <td>59</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>105</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>61</td>\n", | |
" <td>14632</td>\n", | |
" <td>61.0</td>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106</th>\n", | |
" <td>2010-01-23</td>\n", | |
" <td>52</td>\n", | |
" <td>14632</td>\n", | |
" <td>61.0</td>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>107</th>\n", | |
" <td>2010-01-24</td>\n", | |
" <td>67</td>\n", | |
" <td>14633</td>\n", | |
" <td>61.0</td>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9151</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>89</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9152</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>54</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9153</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>56</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9154</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>76</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9155</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>95</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9156</th>\n", | |
" <td>2014-12-26</td>\n", | |
" <td>32</td>\n", | |
" <td>16430</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9157</th>\n", | |
" <td>2014-12-27</td>\n", | |
" <td>90</td>\n", | |
" <td>16431</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9158</th>\n", | |
" <td>2014-12-27</td>\n", | |
" <td>73</td>\n", | |
" <td>16431</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9159</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>90</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9160</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>55</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9161</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>88</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9162</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>49</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9163</th>\n", | |
" <td>2014-12-28</td>\n", | |
" <td>93</td>\n", | |
" <td>16432</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9164</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>51</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9165</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>63</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9166</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>27</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9167</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>92</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9168</th>\n", | |
" <td>2014-12-29</td>\n", | |
" <td>53</td>\n", | |
" <td>16433</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9169</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>66</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9170</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>92</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9171</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>94</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9172</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>75</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9173</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>27</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9174</th>\n", | |
" <td>2014-12-30</td>\n", | |
" <td>99</td>\n", | |
" <td>16434</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9175</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>83</td>\n", | |
" <td>16435</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9176</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>42</td>\n", | |
" <td>16435</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9177</th>\n", | |
" <td>2014-12-31</td>\n", | |
" <td>44</td>\n", | |
" <td>16435</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9178</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>93</td>\n", | |
" <td>16436</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9179</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>30</td>\n", | |
" <td>16436</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9180</th>\n", | |
" <td>2015-01-01</td>\n", | |
" <td>80</td>\n", | |
" <td>16436</td>\n", | |
" <td>97.0</td>\n", | |
" <td>96</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>9103 rows × 5 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Date PersonId DateEpoch ref test\n", | |
"78 2010-01-19 99 14628 56.0 55\n", | |
"79 2010-01-19 96 14628 56.0 55\n", | |
"80 2010-01-19 88 14628 56.0 55\n", | |
"81 2010-01-20 94 14629 56.0 55\n", | |
"82 2010-01-20 48 14629 57.0 56\n", | |
"83 2010-01-20 74 14629 57.0 56\n", | |
"84 2010-01-20 95 14629 57.0 56\n", | |
"85 2010-01-20 70 14629 57.0 56\n", | |
"86 2010-01-21 71 14630 57.0 56\n", | |
"87 2010-01-21 62 14630 57.0 56\n", | |
"88 2010-01-21 77 14630 57.0 56\n", | |
"89 2010-01-21 65 14630 57.0 56\n", | |
"90 2010-01-21 63 14630 57.0 56\n", | |
"91 2010-01-21 74 14630 57.0 56\n", | |
"92 2010-01-21 54 14630 57.0 56\n", | |
"93 2010-01-21 86 14630 57.0 56\n", | |
"94 2010-01-22 32 14631 58.0 57\n", | |
"95 2010-01-22 85 14631 58.0 57\n", | |
"96 2010-01-22 80 14631 59.0 58\n", | |
"97 2010-01-22 72 14631 59.0 58\n", | |
"98 2010-01-22 97 14631 59.0 58\n", | |
"99 2010-01-22 57 14631 59.0 58\n", | |
"100 2010-01-23 50 14632 60.0 59\n", | |
"101 2010-01-23 96 14632 60.0 59\n", | |
"102 2010-01-23 57 14632 60.0 59\n", | |
"103 2010-01-23 30 14632 60.0 59\n", | |
"104 2010-01-23 92 14632 60.0 59\n", | |
"105 2010-01-23 61 14632 61.0 60\n", | |
"106 2010-01-23 52 14632 61.0 60\n", | |
"107 2010-01-24 67 14633 61.0 60\n", | |
"... ... ... ... ... ...\n", | |
"9151 2014-12-26 89 16430 97.0 96\n", | |
"9152 2014-12-26 54 16430 97.0 96\n", | |
"9153 2014-12-26 56 16430 97.0 96\n", | |
"9154 2014-12-26 76 16430 97.0 96\n", | |
"9155 2014-12-26 95 16430 97.0 96\n", | |
"9156 2014-12-26 32 16430 97.0 96\n", | |
"9157 2014-12-27 90 16431 97.0 96\n", | |
"9158 2014-12-27 73 16431 97.0 96\n", | |
"9159 2014-12-28 90 16432 97.0 96\n", | |
"9160 2014-12-28 55 16432 97.0 96\n", | |
"9161 2014-12-28 88 16432 97.0 96\n", | |
"9162 2014-12-28 49 16432 97.0 96\n", | |
"9163 2014-12-28 93 16432 97.0 96\n", | |
"9164 2014-12-29 51 16433 97.0 96\n", | |
"9165 2014-12-29 63 16433 97.0 96\n", | |
"9166 2014-12-29 27 16433 97.0 96\n", | |
"9167 2014-12-29 92 16433 97.0 96\n", | |
"9168 2014-12-29 53 16433 97.0 96\n", | |
"9169 2014-12-30 66 16434 97.0 96\n", | |
"9170 2014-12-30 92 16434 97.0 96\n", | |
"9171 2014-12-30 94 16434 97.0 96\n", | |
"9172 2014-12-30 75 16434 97.0 96\n", | |
"9173 2014-12-30 27 16434 97.0 96\n", | |
"9174 2014-12-30 99 16434 97.0 96\n", | |
"9175 2014-12-31 83 16435 97.0 96\n", | |
"9176 2014-12-31 42 16435 97.0 96\n", | |
"9177 2014-12-31 44 16435 97.0 96\n", | |
"9178 2015-01-01 93 16436 97.0 96\n", | |
"9179 2015-01-01 30 16436 97.0 96\n", | |
"9180 2015-01-01 80 16436 97.0 96\n", | |
"\n", | |
"[9103 rows x 5 columns]" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Show where the calculation doesn't match.\n", | |
"print(\"Where reference ('ref') calculation of number of unique people doesn't match 'test':\")\n", | |
"df['ref'] = ref\n", | |
"df['test'] = test\n", | |
"df.loc[df['ref'] != df['test']]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment