Skip to content

Instantly share code, notes, and snippets.

@tim-fan
Created March 4, 2018 07:25
Show Gist options
  • Save tim-fan/bf66f1d57ff0fd45af6f5c65b6530f96 to your computer and use it in GitHub Desktop.
Save tim-fan/bf66f1d57ff0fd45af6f5c65b6530f96 to your computer and use it in GitHub Desktop.
Project report for Coursera course 'Regression Models', March 2018
---
title: 'Regression Models Project: mtcars'
author: "Tim F"
date: "4 March 2018"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set()
```
## Executive summary
This analysis seeks to answer the following two questions:
1. Is an automatic or manual transmission better for fuel economy (measured in miles per gallon)?
2. Quantify the miles per gallon difference between automatic and manual transmissions
The questions are addressed via the application of linear regression over the mtcars[^1] dataset.
The analysis concludes that after accounting for weight of the car, the dataset does not show any significant
influence of transmission type on fuel economy. Hence the answers to the above questions, with reference to the fitted model are:
1. Neither transmission is seen to be significantly better for fuel economy in the mtcars dataset
2. The miles-per-gallon difference between automatic and manual transmissions is zero (null hypothesis is not rejected)
The following sections outline the analysis conducted to reach these conclusions.
## Exploratory Analysis
In order to gain an appreciation for the variables present in the mtcars dataset, their pairwise relations are plotted as so:
```{r explore, eval=FALSE}
library(GGally)
ggpairs(mtcars)
```
See appendix 1 for the resultant figure.
From the generated pairwise plots, we see that transmission (am) is correlated with fuel economy (mpg) in this dataset.
Of all variables, weight (wt) shows the highest correlation with fuel economy.
As described in [^1], it is expected from physical principles that weight should be proportional to
gallons-per-mile, so inversely proportional to miles-per-gallon. This is supported by the strong
negative correlation between mpg and wt.
[^1]: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
## Naive Model
As a first step, the following simple model is fitted between the two variables of interest:
```{r naive}
fit1 <- lm(mpg ~ am, mtcars)
summary(fit1)$coefficients
```
The resulting model shows manual cars have average fuel economy of `r round((coef(fit1)[1]),2)` mpg
while automatic cars have average fuel economy of `r round(coef(fit1)[1] + coef(fit1)[2],2)` mpg.
This suggests automatic cars travel `r round(coef(fit1)[2],2)` miles further per gallon fuel.
This difference in fuel economy is statistically significant (p = 0.0003). A full summary of the model is provided in appendix 2.
## Accounting for Weight
From the coursework we know that omitting a variable which is correlated with the included variables leads
to bias in the fitted model.
From the exploratory analysis we know that fuel economy is strongly correlated with weight, and that weight
is correlated with transmisssion type. Has the omission of weight in the simple model led to a bias in the estimated coefficients?
To address this possibilty, a model is fitted which includes weight as a regressor:
```{r weightMod}
fit2 <- lm(mpg ~ wt + am, mtcars)
summary(fit2)$coefficients
```
The model including weight appears to fit the data much better (R-squared = `r round(summary(fit2)$r.squared,2)` vs `r round(summary(fit1)$r.squared,2)` for am-only model)
Furthermore, now that weight is accounted for, the effect of transmission on fuel economy seems to have dissappeared.
The am variable is given a small negative coefficient in the model (`r round(fit2$coefficients[3],2)`), which is easily explained by the
null-hypothesis, that transmission type has no effect on fuel economy (p=`r round(summary(fit2)$coefficients[3,4],2) `).
Hence the anaylsis suggests that transmission type does not affect fuel economy.
## Model Validation
To investigate the validity of the fitted model, the residuals are plotted:
```{r residuals, eval=FALSE}
par(mfrow = c(2, 2))
plot(fit2)
```
See appendix 4 for the resultant figure.
There does not appear to be any major issues apparent in the residual plots - there are no clear
systematic patterns in the residuals v. fitted plot, and the Q-Q plot shows the residuals to be approximately
normally distributed.
The plots do identify a few outlier points which are not well fitted by the model. The Chrysler Imperial is of particular
concern due to relatively high leverage. Future work could focus on understanding and/or mitigating the effect of
these outliers.
Overall the residual plots do not suggests any major issues with the chosen model.
## Conclusion
The above analysis of the mtcars dataset suggests that after accounting for vehicle weight, transmission type shows no significant effect on fuel economy.
## Appendix
### Appendix 1: mtcars pairwise relations
```{r apx1, echo=FALSE, cache=TRUE}
library(GGally)
ggpairs(mtcars)
```
### Appendix 2: summary of model fit1
```{r apx2, echo=FALSE}
summary(fit1)
```
### Appendix 3: summary of model fit2
```{r apx3, echo=FALSE}
summary(fit2)
```
### Appendix 4: Fit2 residual plots
```{r apx4, echo=FALSE}
par(mfrow = c(2, 2))
plot(fit2)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment