Count dates, disregarding NULLs

Your test data:

text = 'created_date  start_date
2014-12-11    2014-12-10
2014-12-11    2014-12-11
2014-12-12    2014-12-13
2014-12-13    NULL
2014-12-13    2014-12-13
2014-12-13    2014-12-13
2014-12-23    NULL
2014-12-23    NULL'

Read the data into a data.frame. NULL doesn’t actually make sense in the table above! (In fact, this must be an error in your data: individual table cells cannot be NULL — the could be 'NULL' of course, i.e. strings.) So we treat them as NAs instead.

(x = read.table(text = text, header = TRUE, stringsAsFactors = FALSE, na.strings = 'NULL'))

created_date	start_date
2014-12-11	2014-12-10
2014-12-11	2014-12-11
2014-12-12	2014-12-13
2014-12-13	NA
2014-12-13	2014-12-13
2014-12-13	2014-12-13
2014-12-23	NA
2014-12-23	NA

Next, we will count the entries, and put them into data_frames for convenience.

library(dplyr)
(all_counts = as_data_frame(table(x$created_date)))

Var1	n
2014-12-11	2
2014-12-12	1
2014-12-13	3
2014-12-23	2

(na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date)))

Var1	n
2014-12-13	1
2014-12-23	2

Finally, we subtract the na_counts from the full_counts. To do this, we first need to join these two tables. Joining will introduce NAs, which we will replace by 0s.

full_join(all_counts, na_counts, by = 'Var1') %>%
    mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
    mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
    select(created_date = Var1, count)

created_date	count
2014-12-11	2
2014-12-12	1
2014-12-13	2
2014-12-23	0

klmr/table.md