This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas | |
| def get_hourly_entries(df): | |
| ''' | |
| The data in the MTA Subway Turnstile data reports on the cumulative | |
| number of entries and exits per row. Assume that you have a dataframe | |
| called df that contains only the rows for a particular turnstile machine | |
| (i.e., unique SCP, C/A, and UNIT). This function should change | |
| these cumulative entry numbers to a count of entries since the last reading | |
| (i.e., entries since the last row in the dataframe). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas | |
| def get_hourly_exits(df): | |
| ''' | |
| The data in the MTA Subway Turnstile data reports on the cumulative | |
| number of entries and exits per row. Assume that you have a dataframe | |
| called df that contains only the rows for a particular turnstile machine | |
| (i.e., unique SCP, C/A, and UNIT). This function should change | |
| these cumulative exit numbers to a count of exits since the last reading | |
| (i.e., exits since the last row in the dataframe). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas | |
| def time_to_hour(time): | |
| ''' | |
| Given an input variable time that represents time in the format of: | |
| "00:00:00" (hour:minutes:seconds) | |
| Write a function to extract the hour part from the input variable time | |
| and return it as an integer. For example: | |
| 1) if hour is 00, your code should return 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import datetime | |
| def reformat_subway_dates(date): | |
| ''' | |
| The dates in our subway data are formatted in the format month-day-year. | |
| The dates in our weather underground data are formatted year-month-day. | |
| In order to join these two data sets together, we'll want the dates formatted | |
| the same way. Write a function that takes as its input a date in the MTA Subway | |
| data format, and returns a date in the weather underground format. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| import pandas | |
| import matplotlib.pyplot as plt | |
| def entries_histogram(turnstile_weather): | |
| ''' | |
| Before we perform any analysis, it might be useful to take a | |
| look at the data we're hoping to analyze. More specifically, let's | |
| examine the hourly entries in our NYC subway data and determine what | |
| distribution the data follows. This data is stored in a dataframe |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| No | |
| No. Because the data size of rain and not rain are not the same. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| import scipy | |
| import scipy.stats | |
| import pandas | |
| def mann_whitney_plus_means(turnstile_weather): | |
| ''' | |
| This function will consume the turnstile_weather dataframe containing | |
| our final turnstile weather data. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Yes | |
| From the results in step 3 we can see that the mean of with_rain and without_rain are quite close. And the P-value of the scipy's Mann-Whitney implementation is small, less than 5%. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| import pandas | |
| import statsmodels.api as sm | |
| """ | |
| In this question, you need to: | |
| 1) implement the linear_regression() procedure | |
| 2) Select features (in the predictions procedure) and make predictions. | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| import scipy | |
| import matplotlib.pyplot as plt | |
| def plot_residuals(turnstile_weather, predictions): | |
| ''' | |
| Using the same methods that we used to plot a histogram of entries | |
| per hour for our data, why don't you make a histogram of the residuals | |
| (that is, the difference between the original hourly entry data and the predicted values). | |
| Try different binwidths for your histogram. |