Last active
October 11, 2022 21:27
-
-
Save awjuliani/b5d83fcf3bf2898656be5730f098e08b to your computer and use it in GitHub Desktop.
A Policy-Gradient algorithm that solves Contextual Bandit problems.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for the implementation. I wonder how is the implementation a policy network? I don't see policy gradient is used.