Last active
December 23, 2015 17:09
-
-
Save geoffwoollard/6666457 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Homework 3 | |
Install dependencies | |
```{r} | |
#install.packages("plyr", dependencies = TRUE) | |
library(plyr) | |
#install.packages("xtable", dependencies = TRUE) | |
library(xtable) | |
``` | |
Load the [data](http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt) | |
```{r} | |
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt" | |
gDat <- read.table(gdURL, header = TRUE, sep = '\t', quote = "\"") | |
``` | |
Check the data is cleaned and ready to roll | |
```{r} | |
str(gDat) | |
tail(gDat) | |
``` | |
## Rich vs. Poor | |
Let's quantitatively look at the weath of nations. | |
We simply break the data set by continent, and get the max and min gdp and their ratio | |
```{r results='asis'} | |
gdpByContinent <- ddply(gDat, ~continent, summarize, | |
minGdpPercap=min(gdpPercap), | |
maxGdpPercap=max(gdpPercap), | |
richVsPoor = max(gdpPercap) / min(gdpPercap) | |
) # round is supposed to give nice numbers back! | |
gdpByContinent <- arrange(gdpByContinent,richVsPoor) | |
gdpByContinent <- xtable(gdpByContinent, digits=0) # digits truncates output | |
print(gdpByContinent, type = "html", include.rownames = FALSE) | |
``` | |
Here I take "wealth distrubution" to be the fold difference between the max and min gdpPercap <br> | |
Yes, it is true, your eyes don't deceive you, the "richest" country is *that much poorer* that the "poorest" country <br> | |
Asia has the largest "wealth distribution", the gap between rich and poor | |
## Life Expectancy Spread | |
Let's look at the spread of the life expectancy | |
There are various metrics | |
* standard deviation | |
* [median absolute deviation](http://en.wikipedia.org/wiki/Median_absolute_deviation) | |
* [interquartile range aka middle fifty](http://en.wikipedia.org/wiki/Interquartile_range) | |
```{r} | |
roundDec <- 1 | |
lifeExpByCont <- ddply(gDat, ~continent, summarize, | |
sdLifeExp = round(sd(lifeExp),roundDec), | |
madLifeExp = round(mad(lifeExp),roundDec), | |
IQRLifeExp = round(IQR(lifeExp),roundDec) | |
) | |
arrange(lifeExpByCont,sdLifeExp) | |
arrange(lifeExpByCont,madLifeExp) | |
arrange(lifeExpByCont,IQRLifeExp) | |
``` | |
As you can see the results depend on the metric used <br> | |
Asia always has the highest spread, but sometimes the lowest spread is Europe, sometimes Oceania <br> | |
Take home lesson - *always mention* what you mean by "spread" <br> | |
## Are people living longer and longer? | |
We compute the average life expencancy for each year over the whole data set<br> | |
But we remove 5% of the max outliers and 5% of the min outliers, since the mean is sensitive to outliers | |
```{r results='asis'} | |
trimFrac <- 0.05 # this is about 7 maxs and 7 mins lopped off | |
lifeExpByYear <- ddply(gDat,~year,summarize, | |
avLifeExp = mean(lifeExp, trim=trimFrac) | |
) | |
lifeExpByYear <- xtable(lifeExpByYear, digit=1) | |
print(lifeExpByYear, type = "html", include.rownames = FALSE) | |
``` | |
## Middle Age | |
Imagine not making it to "middle age"" (taken to be 40 years) <br> | |
How many countries are there in each continent that have a life expectancy less thatn 40? <br> | |
We feed in a subset of data with out middle age cut off right at the start <br> | |
Let's keep the table ordered by continent and year | |
```{r results='asis'} | |
middleAge <- 40 | |
middleAgeCount <- ddply(subset(gDat,subset = lifeExp < middleAge), | |
~continent + year,summarize, | |
countryCount=length(unique(country)) | |
) | |
middleAgeCount <- xtable(middleAgeCount) | |
print(middleAgeCount, type = "html", include.rownames = FALSE) | |
``` | |
## Life Expectancy Extrema | |
Now let's look at who has the most extreme life expectancy in a given year <br> | |
We first write a funciton that gives you the answer to "what country has this life expencancy?" | |
```{r results='asis'} | |
getCountryWithLE <- function(lifeExpVal) return(gDat[which(gDat$lifeExp == lifeExpVal),]$country) | |
lifeExpByYear <- ddply(gDat, ~year, summarize, | |
minLifeExp = min(lifeExp), | |
minCountry = getCountryWithLE(minLifeExp)[1] , | |
maxLifeExp = max(lifeExp), | |
maxCountry = getCountryWithLE(maxLifeExp)[1] | |
) # because multiple countries can return we need to truncate minCountry and maxCountry, so there may be other additional countries | |
lifeExpByYear <- xtable(lifeExpByYear, digits=0) | |
print(lifeExpByYear, type = "html", include.rownames = FALSE) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment