Skip to content

Instantly share code, notes, and snippets.

@tomhopper
Created October 29, 2017 21:45
Show Gist options
  • Save tomhopper/573a47d67ea0d9cf9b9ee41178af1de2 to your computer and use it in GitHub Desktop.
Save tomhopper/573a47d67ea0d9cf9b9ee41178af1de2 to your computer and use it in GitHub Desktop.
SOCR Data - 25,000 Records of Human Heights (in) and Weights (lbs)
## Height and Weight of 18 year olds
## from Hong Kong 1993 Growth Survey data,
## simulated by SOCR from reported summary statistics
## Heights in inches
## Weights in pounds
## Explanation \url{http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights}
## Data \url{http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html}
## Libraries ####
library(rvest) # Web scraping
library(magrittr) # Data wrangling
library(dplyr) # Data wrangling
library(ggplot2) # Plotting distribution
library(nortest) # used to test for normality
## Scrape the data from SOCR's website ####
url <- "http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html"
the_data <- read_html(url) %>%
html_table(header = TRUE) %>%
extract2(1) %>%
setNames(make.names(names = colnames(.))) %>%
mutate(Index = NULL)
## Histograms ####
## for height
the_data %>%
ggplot(aes(x = Height.Inches.)) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, args = list(mean = mean(the_data$Height.Inches.), sd = sd(the_data$Height.Inches.)), color = "blue") +
theme_minimal()
## for weight
the_data %>%
ggplot(aes(x = Weight.Pounds.)) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, args = list(mean = mean(the_data$Weight.Pounds.), sd = sd(the_data$Weight.Pounds.)), color = "blue") +
theme_minimal()
## Summary of data ####
summary(the_data$Height.Inches.)
summary(the_data$Weight.Pounds.)
# Note results inconsistent with "age 6 to 18"
## Demonstrating the data is simulated from a normal distribution; not actual measured data ####
ad.test(the_data$Height.Inches.)
ad.test(the_data$Weight.Pounds.)
sf.test(sample(x = the_data$Height.Inches., size = 1000, replace = FALSE))
sf.test(sample(x = the_data$Weight.Pounds., size = 1000, replace = FALSE))
lillie.test(the_data$Height.Inches.)
lillie.test(the_data$Weight.Pounds.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment