Created
June 5, 2015 15:11
-
-
Save Mengyuz/dba77f71199d657feefc to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas | |
| def get_hourly_entries(df): | |
| ''' | |
| The data in the MTA Subway Turnstile data reports on the cumulative | |
| number of entries and exits per row. Assume that you have a dataframe | |
| called df that contains only the rows for a particular turnstile machine | |
| (i.e., unique SCP, C/A, and UNIT). This function should change | |
| these cumulative entry numbers to a count of entries since the last reading | |
| (i.e., entries since the last row in the dataframe). | |
| More specifically, you want to do two things: | |
| 1) Create a new column called ENTRIESn_hourly | |
| 2) Assign to the column the difference between ENTRIESn of the current row | |
| and the previous row. If there is any NaN, fill/replace it with 1. | |
| You may find the pandas functions shift() and fillna() to be helpful in this exercise. | |
| Examples of what your dataframe should look like at the end of this exercise: | |
| C/A UNIT SCP DATEn TIMEn DESCn ENTRIESn EXITSn ENTRIESn_hourly | |
| 0 A002 R051 02-00-00 05-01-11 00:00:00 REGULAR 3144312 1088151 1 | |
| 1 A002 R051 02-00-00 05-01-11 04:00:00 REGULAR 3144335 1088159 23 | |
| 2 A002 R051 02-00-00 05-01-11 08:00:00 REGULAR 3144353 1088177 18 | |
| 3 A002 R051 02-00-00 05-01-11 12:00:00 REGULAR 3144424 1088231 71 | |
| 4 A002 R051 02-00-00 05-01-11 16:00:00 REGULAR 3144594 1088275 170 | |
| 5 A002 R051 02-00-00 05-01-11 20:00:00 REGULAR 3144808 1088317 214 | |
| 6 A002 R051 02-00-00 05-02-11 00:00:00 REGULAR 3144895 1088328 87 | |
| 7 A002 R051 02-00-00 05-02-11 04:00:00 REGULAR 3144905 1088331 10 | |
| 8 A002 R051 02-00-00 05-02-11 08:00:00 REGULAR 3144941 1088420 36 | |
| 9 A002 R051 02-00-00 05-02-11 12:00:00 REGULAR 3145094 1088753 153 | |
| 10 A002 R051 02-00-00 05-02-11 16:00:00 REGULAR 3145337 1088823 243 | |
| ... | |
| ... | |
| ''' | |
| #your code here | |
| df['ENTRIESn_hourly'] = df['ENTRIESn'] - df['ENTRIESn'].shift(1) | |
| return df.fillna(1) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment