Skip to content

Instantly share code, notes, and snippets.

@awjuliani
Last active October 11, 2022 21:27
Show Gist options
  • Save awjuliani/b5d83fcf3bf2898656be5730f098e08b to your computer and use it in GitHub Desktop.
Save awjuliani/b5d83fcf3bf2898656be5730f098e08b to your computer and use it in GitHub Desktop.
A Policy-Gradient algorithm that solves Contextual Bandit problems.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@alphashuro
Copy link

also, i didn't see anything on one hot encoding in the post, is it perhaps in one of your other posts?

@easwar1977
Copy link

This is a great demo. Can you also suggest how to I store the model as .h5 file, (like in Keras), and re-use it ?

@Riotpiaole
Copy link

@alphashuro agreed. I struggle the same prob what is OH mean?

@dargor
Copy link

dargor commented Oct 18, 2017

Hi, I'm learning RL with your articles, great work 👍

Here is a quick diff to use raw TF (as of 1.3) instead of slim :

-        state_in_OH = slim.one_hot_encoding(self.state_in, s_size)
-        output = slim.fully_connected(state_in_OH,
-                                      a_size,
-                                      biases_initializer=None,
-                                      activation_fn=tf.nn.sigmoid,
-                                      weights_initializer=ones)
+        state_in_OH = tf.one_hot(self.state_in, s_size)
+        output = tf.layers.dense(state_in_OH, a_size, tf.nn.sigmoid,
+                                 use_bias=False, kernel_initializer=ones)

@Riotpiaole OH = one hot [encoding]

@lipixun
Copy link

lipixun commented Oct 30, 2017

According to my experiment (tensorflow 1.3), I suggest to use AdamOptimizer instead of GradientDescentOptimizer since GradientDescentOptimizer suffers from training stability issue.

@lipixun
Copy link

lipixun commented Oct 30, 2017

@Riotpiaole I've re-implement the tutorial codes here, you may take a look at it.

@pooriaPoorsarvi
Copy link

can anyone explain to me why we do not use softmax instead of sigmoid? and also why we don't use bias?(I tried both and it wouldn't work)

@pooriaPoorsarvi
Copy link

@lipixun do you know the answer to my question? it would really help me thanks

@JaeDukSeo
Copy link

@pooriaPoorsarvi as seen above we already got the responsible_weight variable, now we are getting the negative
Log likelihood to optimize for the maxium (tf only can optimize) no need to consider every other classes

@araknadash
Copy link

Instead of using slim, can use tf as:
state_in_OH = tf.one_hot(self.state_in, s_size)
output = tf.layers.dense(state_in_OH, a_size, tf.nn.sigmoid, use_bias=False, kernel_initializer = tf.ones_initializer())

@xkrishnam
Copy link

xkrishnam commented Jul 29, 2020

Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone

@daniel-xion
Copy link

Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone

Thanks for the implementation. I wonder how is the implementation a policy network? I don't see policy gradient is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment