Skip to content

Instantly share code, notes, and snippets.

@mGalarnyk
Last active January 2, 2017 21:26
Show Gist options
  • Save mGalarnyk/6d06dc27331b9df3fbc205bad9a4bde7 to your computer and use it in GitHub Desktop.
Save mGalarnyk/6d06dc27331b9df3fbc205bad9a4bde7 to your computer and use it in GitHub Desktop.
corr.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.38n89lga5
corr <- function(directory, threshold = 0) {
# Reading in all files and making a large data.table
lst <- lapply(file.path(directory, list.files(path = directory, pattern="*.csv")), data.table::fread)
dt <- rbindlist(lst)
# Only keep completely observed cases
dt <- dt[complete.cases(dt),]
# Apply threshold
dt <- dt[, .(nobs = .N, corr = cor(x = sulfate, y = nitrate)), by = ID][nobs > threshold]
return(dt[, corr])
}
# Example Usage
corr(directory = '~/Desktop/specdata', threshold = 150)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment