Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created April 1, 2012 01:13
Show Gist options
  • Save neilkod/2270271 to your computer and use it in GitHub Desktop.
Save neilkod/2270271 to your computer and use it in GitHub Desktop.
goal: to have data frame with team name(all_team_summary), mean time in minutes for men, mean time in minutes for women.
instead of doing this:
raw_data = read.csv('/Users/nkodner/Dropbox/development/python/2012_mb_corporate_run/data/results_2012.tsv',header=FALSE, sep='\t',stringsAsFactors=FALSE)
names(raw_data) <- c('overall_position','gender_position','bib','name','time','seconds','minutes','gender','team')
male_runners <- raw_data[raw_data$gender == "M",]
female_runners <- raw_data[raw_data$gender == "F",]
male_team_stats <- ddply(male_runners,"team",function(dat) c(nrow(dat), median(dat$minutes),mean(dat$minutes)))
names(male_team_stats) <- c('team','count','median_time_in_minutes','mean_time_in_minutes')
female_team_stats <- ddply(female_runners,"team",function(dat) c(nrow(dat), median(dat$minutes),mean(dat$minutes)))
names(female_team_stats) <- c('team','count','median_time_in_minutes','mean_time_in_minutes')
# join the male and female summaries together so we can compare male and female mean times
all_team_summary<-merge(male_team_stats,female_team_stats,by="team")
can this be done with a single ddply command instead of creating intermediate data frames for male + female and then summarizing and joining them?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment