tailor_split_method.md

Splitting Data With `add_tailor(method)`

This gist whittles down discussion (with myself) in tidymodels/workflows#233.

The add_tailor(method) argument supplies information about how the data supplied to fit.workflow() came to be. Its currently supplied as the class of the rsplit object from which data arose, i.e. "mc_split".

tune does not need to use add_tailor(method), as it has all of the information it needs in the split itself to make an inner split, and interfaces with the .fit_*() helpers directly instead of fit.workflow(). workflows, in fit.workflow(), does not actually have all the information it needs between fit.workflow(data) and add_tailor(method) unless the data arose from a random sample. So, in short: tune does not use method and workflows can't do anything with it, as it is currently implemented.

We could go a few ways from here:

a) Remove add_tailor(method) altogether and just take a random split if the tailor has a calibrator.

Has the potential to silently leak data in the case where fit.workflow(data) arose from a non-random split.
Note that this leakage would not occur inside of tune.

b) Require fit.workflow(data) to be a split if the tailor has a calibrator.

This would make fit.workflow(data) not type stable, requiring folks programming around workflows to work with additional logic; seems like it violates the underlying principle of a common interface.

c) Error on fit.workflow() if the tailor has a calibrator, referring people to use fit_resamples().

Ultimately the most principled, I think: requires the least amount of work from us, allows us to cut out the added complexity of workflows needing to "know about" rsample, forefronts the idea that workflows need to partition data even for a single fit on a workflow with a calibrator, and ultimately only impacts the realistically very small segment of people who want to apply a calibrator and do so without resampling it.

d) Adapt add_tailor(method) to take in a split or some metadata about it that's sufficient to actually be able to make an inner split.

This means that the specification of the workflow would now need to know about the data and resampling scheme. :(

simonpcouch/tailor_split_method.md

Select an option

No results found

Select an option

No results found

Splitting Data With `add_tailor(method)`

simonpcouch commented Sep 26, 2024 •

edited

Loading

Uh oh!

simonpcouch/tailor_split_method.md

Splitting Data With add_tailor(method)

simonpcouch commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Splitting Data With `add_tailor(method)`

simonpcouch commented Sep 26, 2024 •

edited

Loading