Skip to content

Instantly share code, notes, and snippets.

@benmarwick
Created March 2, 2018 23:45
Show Gist options
  • Save benmarwick/d47393820adceb3e9bf50eb09e36bd16 to your computer and use it in GitHub Desktop.
Save benmarwick/d47393820adceb3e9bf50eb09e36bd16 to your computer and use it in GitHub Desktop.
Use a character vector to count occurrences per row, in a data frame with rows of text, like a dictionary lookup, but with a very specific custom dictionary
library(usdanutrients)
library(tidyverse)
foods_simple <-
food %>%
separate(food, into = str_glue('V{1:11}'), sep = ",") %>%
distinct(V1) %>%
mutate(V1 = tolower(V1)) %>%
mutate(V1 = str_replace_all(V1, "[[:punct:]]", "")) %>%
rename(food = V1)
words_about_foods <-
data_frame(item = 1:2,
words = c("You should eat chocolate, candies and, cake infrequently",
"You should each spinach, potatoes, and broccoli often")
)
xx <-
words_about_foods %>%
nest(-item) %>%
mutate(word = map(data, ~unlist(str_split(.x$words, " ")))) %>%
unnest(word) %>%
mutate(word = tolower(str_replace_all(word, "[[:punct:]]", ""))) %>%
inner_join(foods_simple, by = c("word" = "food")) %>%
group_by(item, word) %>%
tally() %>%
spread(item, n)
# A tibble: 6 x 3
word `1` `2`
<chr> <int> <int>
1 broccoli NA 1
2 cake 1 NA
3 candies 1 NA
4 chocolate 1 NA
5 potatoes NA 1
6 spinach NA 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment