This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pandas import * | |
from ggplot import * | |
def plot_weather_data(turnstile_weather): | |
''' | |
plot_weather_data is passed a dataframe called turnstile_weather. | |
Use turnstile_weather along with ggplot to make another data visualization | |
focused on the MTA and weather data we used in Project 3. | |
Make a type of visualization different than what you did in the previous exercise. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pandas import * | |
from ggplot import * | |
def plot_weather_data(turnstile_weather): | |
''' | |
You are passed in a dataframe called turnstile_weather. | |
Use turnstile_weather along with ggplot to make a data visualization | |
focused on the MTA and weather data we used in assignment #3. | |
You should feel free to implement something that we discussed in class | |
(e.g., scatterplots, line plots, or histograms) or attempt to implement |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy | |
import matplotlib.pyplot as plt | |
import sys | |
def compute_r_squared(data, predictions): | |
''' | |
In exercise 5, we calculated the R^2 value for you. But why don't you try and | |
and calculate the R^2 value yourself. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy | |
import matplotlib.pyplot as plt | |
def plot_residuals(turnstile_weather, predictions): | |
''' | |
Using the same methods that we used to plot a histogram of entries | |
per hour for our data, why don't you make a histogram of the residuals | |
(that is, the difference between the original hourly entry data and the predicted values). | |
Try different binwidths for your histogram. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas | |
import statsmodels.api as sm | |
""" | |
In this question, you need to: | |
1) implement the linear_regression() procedure | |
2) Select features (in the predictions procedure) and make predictions. | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Yes | |
From the results in step 3 we can see that the mean of with_rain and without_rain are quite close. And the P-value of the scipy's Mann-Whitney implementation is small, less than 5%. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy | |
import scipy.stats | |
import pandas | |
def mann_whitney_plus_means(turnstile_weather): | |
''' | |
This function will consume the turnstile_weather dataframe containing | |
our final turnstile weather data. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
No | |
No. Because the data size of rain and not rain are not the same. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas | |
import matplotlib.pyplot as plt | |
def entries_histogram(turnstile_weather): | |
''' | |
Before we perform any analysis, it might be useful to take a | |
look at the data we're hoping to analyze. More specifically, let's | |
examine the hourly entries in our NYC subway data and determine what | |
distribution the data follows. This data is stored in a dataframe |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import datetime | |
def reformat_subway_dates(date): | |
''' | |
The dates in our subway data are formatted in the format month-day-year. | |
The dates in our weather underground data are formatted year-month-day. | |
In order to join these two data sets together, we'll want the dates formatted | |
the same way. Write a function that takes as its input a date in the MTA Subway | |
data format, and returns a date in the weather underground format. |
NewerOlder