Created
May 11, 2021 07:40
-
-
Save djnavarro/be6106c81ec00f6167fa42014d58f541 to your computer and use it in GitHub Desktop.
aggregate a tibble rowwise, grouped by prefix
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tidyverse) | |
# make toy data ----------------------------------------------------------- | |
# column of n randomly sampled responses | |
likert_col <- function(n = 10) { | |
sample(7, size = 10, replace = TRUE) | |
} | |
# toy data | |
dat <- tibble( | |
cat_1 = likert_col(), | |
cat_2 = likert_col(), | |
cat_3 = likert_col(), | |
dog_1 = likert_col(), | |
dog_2 = likert_col() | |
) | |
# an ugly solution -------------------------------------------------------- | |
# sums all columns that begin with prefix, | |
# returns a tibble with one column | |
prefix_aggregate <- function(prefix, dat) { | |
dat %>% | |
rowwise() %>% | |
transmute(!!prefix := sum(c_across(starts_with(prefix)))) %>% | |
ungroup() | |
} | |
# find relevant prefixes | |
prefixes <- names(dat) %>% | |
str_remove_all("_[0-9]*$") %>% | |
unique() | |
# apply the aggregator for each prefix then bind | |
agg <- prefixes %>% | |
map(prefix_aggregate, dat = dat) %>% | |
bind_cols() | |
# print ------------------------------------------------------------------- | |
print(dat) | |
print(agg) |
I think this would be a nice syntax to solve similar problems. I don't like the name though. Maybe there is something better than fold
.
# `fold` does not exist yet
dat %>%
transmute(fold(starts_with("cat"),
list(sum = ~ rowSums(.x),
mean = ~ rowMeans(.x))))
# A tibble: 10 x 2
cat_sum cat_mean
<dbl> <dbl>
1 11 3.67
2 10 3.33
3 6 2
4 4 1.33
5 10 3.33
6 7 2.33
7 12 4
8 12 4
9 17 5.67
10 13 4.33
# `fold_over` does not exist yet
dat %>%
transmute(fold_over(cut_names("_[0-9]*$"),
~ starts_with(.x),
~ rowSums(.x)))
# A tibble: 10 x 2
cat dog
<dbl> <dbl>
1 11 11
2 10 10
3 6 6
4 4 4
5 10 10
6 7 7
7 12 12
8 12 12
9 17 17
10 13 13
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a package on Github which handles similar problems. For this specific problem it has no optimal solution:
I wonder why
across
is not working correctly:I'm thinking about how a function should look like to handles this problem in an optimal way.