Created
June 5, 2015 15:12
-
-
Save Mengyuz/ca91fa56ca0d0169958e to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas | |
| def get_hourly_exits(df): | |
| ''' | |
| The data in the MTA Subway Turnstile data reports on the cumulative | |
| number of entries and exits per row. Assume that you have a dataframe | |
| called df that contains only the rows for a particular turnstile machine | |
| (i.e., unique SCP, C/A, and UNIT). This function should change | |
| these cumulative exit numbers to a count of exits since the last reading | |
| (i.e., exits since the last row in the dataframe). | |
| More specifically, you want to do two things: | |
| 1) Create a new column called EXITSn_hourly | |
| 2) Assign to the column the difference between EXITSn of the current row | |
| and the previous row. If there is any NaN, fill/replace it with 0. | |
| You may find the pandas functions shift() and fillna() to be helpful in this exercise. | |
| Example dataframe below: | |
| Unnamed: 0 C/A UNIT SCP DATEn TIMEn DESCn ENTRIESn EXITSn ENTRIESn_hourly EXITSn_hourly | |
| 0 0 A002 R051 02-00-00 05-01-11 00:00:00 REGULAR 3144312 1088151 0 0 | |
| 1 1 A002 R051 02-00-00 05-01-11 04:00:00 REGULAR 3144335 1088159 23 8 | |
| 2 2 A002 R051 02-00-00 05-01-11 08:00:00 REGULAR 3144353 1088177 18 18 | |
| 3 3 A002 R051 02-00-00 05-01-11 12:00:00 REGULAR 3144424 1088231 71 54 | |
| 4 4 A002 R051 02-00-00 05-01-11 16:00:00 REGULAR 3144594 1088275 170 44 | |
| 5 5 A002 R051 02-00-00 05-01-11 20:00:00 REGULAR 3144808 1088317 214 42 | |
| 6 6 A002 R051 02-00-00 05-02-11 00:00:00 REGULAR 3144895 1088328 87 11 | |
| 7 7 A002 R051 02-00-00 05-02-11 04:00:00 REGULAR 3144905 1088331 10 3 | |
| 8 8 A002 R051 02-00-00 05-02-11 08:00:00 REGULAR 3144941 1088420 36 89 | |
| 9 9 A002 R051 02-00-00 05-02-11 12:00:00 REGULAR 3145094 1088753 153 333 | |
| ''' | |
| #your code here | |
| df['EXITSn_hourly'] = df['EXITSn']-df['EXITSn'].shift(1) | |
| return df.fillna(0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment