Last active
October 11, 2022 21:27
-
-
Save awjuliani/b5d83fcf3bf2898656be5730f098e08b to your computer and use it in GitHub Desktop.
A Policy-Gradient algorithm that solves Contextual Bandit problems.
Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone
Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone
Thanks for the implementation. I wonder how is the implementation a policy network? I don't see policy gradient is used.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Instead of using slim, can use tf as:
state_in_OH = tf.one_hot(self.state_in, s_size)
output = tf.layers.dense(state_in_OH, a_size, tf.nn.sigmoid, use_bias=False, kernel_initializer = tf.ones_initializer())