Skip to content

Instantly share code, notes, and snippets.

@brshallo
Created February 28, 2019 19:58
Show Gist options
  • Select an option

  • Save brshallo/a25cc71fc66cbdee4e8ce51b1f74d5ca to your computer and use it in GitHub Desktop.

Select an option

Save brshallo/a25cc71fc66cbdee4e8ce51b1f74d5ca to your computer and use it in GitHub Desktop.
Hack for having consistent sample size for step_other.
library(recipes)
library(tidyverse)

diamonds_nested <- diamonds %>% 
  group_by(cut) %>% 
  nest() %>% 
  mutate(recipes = map(data, ~recipe(price ~ clarity + color + carat, data = .x)))

diamonds_nested %>% 
  mutate(threshold = 30 / map_dbl(data,  nrow),
         recipes = map2(recipes, threshold, ~step_other(.x, all_nominal(), threshold = .y)))
#> # A tibble: 5 x 4
#>   cut       data                  recipes      threshold
#>   <ord>     <list>                <list>           <dbl>
#> 1 Ideal     <tibble [21,551 x 9]> <S3: recipe>   0.00139
#> 2 Premium   <tibble [13,791 x 9]> <S3: recipe>   0.00218
#> 3 Good      <tibble [4,906 x 9]>  <S3: recipe>   0.00611
#> 4 Very Good <tibble [12,082 x 9]> <S3: recipe>   0.00248
#> 5 Fair      <tibble [1,610 x 9]>  <S3: recipe>   0.0186

Created on 2019-02-28 by the reprex package (v0.2.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment