Last active
June 9, 2022 16:38
-
-
Save pmineiro/5863eb0ba0b1f6963447f8f500bf0f1c to your computer and use it in GitHub Desktop.
The latest in OPE-CS. This can track the running mean of a predictable policy sequence in a nonstationary environment and does not require an explicit importance weight upper bound. For a fixed policy in a stationary environment the running intersection can be used to shrink the interval monotonically.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Convergence is more rapid when evaluating policies nearer to the logging policy.