Your test data:
text = 'created_date start_date
2014-12-11 2014-12-10
2014-12-11 2014-12-11
2014-12-12 2014-12-13
2014-12-13 NULL
2014-12-13 2014-12-13
2014-12-13 2014-12-13
2014-12-23 NULL
2014-12-23 NULL'Read the data into a data.frame. NULL doesn’t actually make sense in the
table above! (In fact, this must be an error in your data: individual table
cells cannot be NULL — the could be 'NULL' of course, i.e. strings.) So
we treat them as NAs instead.
(x = read.table(text = text, header = TRUE, stringsAsFactors = FALSE, na.strings = 'NULL'))| created_date | start_date |
|---|---|
| 2014-12-11 | 2014-12-10 |
| 2014-12-11 | 2014-12-11 |
| 2014-12-12 | 2014-12-13 |
| 2014-12-13 | NA |
| 2014-12-13 | 2014-12-13 |
| 2014-12-13 | 2014-12-13 |
| 2014-12-23 | NA |
| 2014-12-23 | NA |
Next, we will count the entries, and put them into data_frames for
convenience.
library(dplyr)
(all_counts = as_data_frame(table(x$created_date)))| Var1 | n |
|---|---|
| 2014-12-11 | 2 |
| 2014-12-12 | 1 |
| 2014-12-13 | 3 |
| 2014-12-23 | 2 |
(na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date)))| Var1 | n |
|---|---|
| 2014-12-13 | 1 |
| 2014-12-23 | 2 |
Finally, we subtract the na_counts from the full_counts. To do this, we
first need to join these two tables. Joining will introduce NAs, which we
will replace by 0s.
full_join(all_counts, na_counts, by = 'Var1') %>%
mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
select(created_date = Var1, count)| created_date | count |
|---|---|
| 2014-12-11 | 2 |
| 2014-12-12 | 1 |
| 2014-12-13 | 2 |
| 2014-12-23 | 0 |