library(tidymodels)
library(tailor)
Consider the following regular grid:
spec <- boost_tree(mode = "regression", trees = tune())
tlr <- tailor() %>% adjust_probability_threshold(threshold = tune())
wflow <- workflow()
wflow <- add_model(wflow, spec)
wflow <- add_formula(wflow, mpg ~ .)
wflow <- add_tailor(wflow, tlr)
grid <- grid_regular(extract_parameter_set_dials(wflow), levels = 3)
grid
# A tibble: 9 × 2
trees threshold
<int> <dbl>
1 1 0
2 1000 0
3 2000 0
4 1 0.5
5 1000 0.5
6 2000 0.5
7 1 1
8 1000 1
9 2000 1
Note that trees
can make use of the submodel trick. min_grid()
shows us
how that grid is ultimately represented in compute_grid_info()
:
wflow_no_post <- remove_tailor(wflow)
grid_no_post <- grid_regular(extract_parameter_set_dials(wflow_no_post), levels = 3)
grid_no_post
# A tibble: 3 × 1
trees
<int>
1 1
2 1000
3 2000
min_grid(spec, grid_no_post)
# A tibble: 1 × 2
trees .submodels
<int> <list>
1 2000 <named list [1]>
Only one model fit. Just the most relevant information for “how this is nested”:
min_grid(spec, grid_no_post)$.submodels
[[1]]
[[1]]$trees
[1] 1 1000
Since the postprocessor doesn’t affect the model fit, adding postprocessor values in the grid shouldn’t affect whether we can hook into the submodel trick or not.
min_grid()
output that (I think?) would allow us to preserve all of
the information we need in the case of grid
would be:
tibble(
trees = 2000,
threshold = list(c(0, 0.5, 1)),
.submodels = list(
list(
tibble(trees = 1, threshold = list(c(0, .5, 1))),
tibble(trees = 1000, threshold = list(c(0, .5, 1)))
)
)
)
# A tibble: 1 × 3
trees threshold .submodels
<dbl> <list> <list>
1 2000 <dbl [3]> <list [2]>
What we currently get is:
min_grid(spec, grid)
# A tibble: 3 × 3
trees threshold .submodels
<int> <dbl> <list>
1 2000 0 <named list [1]>
2 2000 0.5 <named list [1]>
3 2000 1 <named list [1]>
As of now, though, min_grid()
methods dispatch on the class of the
model_spec
, so they don't know where the postprocessor parameter
columns come from and can't differentiate them from recipe
parameter grids. Instead, though, we can just make a wrapper around
min_grid()
that “pushes” those threshold values through to
.submodels
to go from this result to the desired one.