Skip to content

Instantly share code, notes, and snippets.

@klmr
Last active August 4, 2016 10:49
Show Gist options
  • Save klmr/b7fd96c113e24c178d2630b66bb73a15 to your computer and use it in GitHub Desktop.
Save klmr/b7fd96c113e24c178d2630b66bb73a15 to your computer and use it in GitHub Desktop.
Count dates, disregarding NULLs

Your test data:

text = 'created_date  start_date
2014-12-11    2014-12-10
2014-12-11    2014-12-11
2014-12-12    2014-12-13
2014-12-13    NULL
2014-12-13    2014-12-13
2014-12-13    2014-12-13
2014-12-23    NULL
2014-12-23    NULL'

Read the data into a data.frame. NULL doesn’t actually make sense in the table above! (In fact, this must be an error in your data: individual table cells cannot be NULL — the could be 'NULL' of course, i.e. strings.) So we treat them as NAs instead.

(x = read.table(text = text, header = TRUE, stringsAsFactors = FALSE, na.strings = 'NULL'))
created_date start_date
2014-12-11 2014-12-10
2014-12-11 2014-12-11
2014-12-12 2014-12-13
2014-12-13 NA
2014-12-13 2014-12-13
2014-12-13 2014-12-13
2014-12-23 NA
2014-12-23 NA

Next, we will count the entries, and put them into data_frames for convenience.

library(dplyr)
(all_counts = as_data_frame(table(x$created_date)))
Var1 n
2014-12-11 2
2014-12-12 1
2014-12-13 3
2014-12-23 2
(na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date)))
Var1 n
2014-12-13 1
2014-12-23 2

Finally, we subtract the na_counts from the full_counts. To do this, we first need to join these two tables. Joining will introduce NAs, which we will replace by 0s.

full_join(all_counts, na_counts, by = 'Var1') %>%
    mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
    mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
    select(created_date = Var1, count)
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 2
2014-12-23 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment