Skip to content

Instantly share code, notes, and snippets.

@djnavarro
Created May 11, 2021 07:40
Show Gist options
  • Save djnavarro/be6106c81ec00f6167fa42014d58f541 to your computer and use it in GitHub Desktop.
Save djnavarro/be6106c81ec00f6167fa42014d58f541 to your computer and use it in GitHub Desktop.
aggregate a tibble rowwise, grouped by prefix
library(tidyverse)
# make toy data -----------------------------------------------------------
# column of n randomly sampled responses
likert_col <- function(n = 10) {
sample(7, size = 10, replace = TRUE)
}
# toy data
dat <- tibble(
cat_1 = likert_col(),
cat_2 = likert_col(),
cat_3 = likert_col(),
dog_1 = likert_col(),
dog_2 = likert_col()
)
# an ugly solution --------------------------------------------------------
# sums all columns that begin with prefix,
# returns a tibble with one column
prefix_aggregate <- function(prefix, dat) {
dat %>%
rowwise() %>%
transmute(!!prefix := sum(c_across(starts_with(prefix)))) %>%
ungroup()
}
# find relevant prefixes
prefixes <- names(dat) %>%
str_remove_all("_[0-9]*$") %>%
unique()
# apply the aggregator for each prefix then bind
agg <- prefixes %>%
map(prefix_aggregate, dat = dat) %>%
bind_cols()
# print -------------------------------------------------------------------
print(dat)
print(agg)
@TimTeaFan
Copy link

TimTeaFan commented May 11, 2021

I have a package on Github which handles similar problems. For this specific problem it has no optimal solution:

library(dplyover) # https://github.com/TimTeaFan/dplyover

dat %>% 
  transmute(over(cut_names("_[0-9]*$"),
                 ~ rowSums(select(cur_data(), starts_with(.x)))))

# A tibble: 10 x 2
     cat   dog
   <dbl> <dbl>
 1    11     4
 2    10     9
 3     6     2
 4     4     8
 5    10     9
 6     7     8
 7    12    10
 8    12     8
 9    17     5
10    13     2

I wonder why across is not working correctly:

dat %>% 
  transmute(over(cut_names("_[0-9]*$"),
                 ~ rowSums(across(starts_with(.x)))))

# A tibble: 10 x 2
     cat   dog
   <dbl> <dbl>
 1    11    11
 2    10    10
 3     6     6
 4     4     4
 5    10    10
 6     7     7
 7    12    12
 8    12    12
 9    17    17
10    13    13

I'm thinking about how a function should look like to handles this problem in an optimal way.

@TimTeaFan
Copy link

TimTeaFan commented May 11, 2021

I think this would be a nice syntax to solve similar problems. I don't like the name though. Maybe there is something better than fold.

# `fold` does not exist yet
dat %>% 
  transmute(fold(starts_with("cat"),
                 list(sum = ~ rowSums(.x),
                      mean = ~ rowMeans(.x))))

# A tibble: 10 x 2
   cat_sum cat_mean
     <dbl>    <dbl>
 1      11     3.67
 2      10     3.33
 3       6     2   
 4       4     1.33
 5      10     3.33
 6       7     2.33
 7      12     4   
 8      12     4   
 9      17     5.67
10      13     4.33

# `fold_over` does not exist yet
dat %>% 
  transmute(fold_over(cut_names("_[0-9]*$"),
                      ~ starts_with(.x),
                      ~ rowSums(.x)))

# A tibble: 10 x 2
     cat   dog
   <dbl> <dbl>
 1    11    11
 2    10    10
 3     6     6
 4     4     4
 5    10    10
 6     7     7
 7    12    12
 8    12    12
 9    17    17
10    13    13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment