Skip to content

Instantly share code, notes, and snippets.

@pmineiro
Last active October 19, 2024 14:50
Show Gist options
  • Save pmineiro/390d6cc820c628d04dea991f8018c054 to your computer and use it in GitHub Desktop.
Save pmineiro/390d6cc820c628d04dea991f8018c054 to your computer and use it in GitHub Desktop.
--cb_dro demo for vowpal wabbit using covertype. To see the lift, note the "since last acc" column with and without --cb-dro.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pmineiro
Copy link
Author

pmineiro commented Dec 5, 2020

In this gist I:

  • pre-train a logging policy using 10% of covertype, and then fix the logging policy thereafter
  • off-policy train another policy using data from the logging policy, either with or without the --cb_dro flag
  • --cb_dro improves the trained policy from 71.8% to 73.3% accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment