awjuliani/ContextualPolicy.ipynb

Last active October 11, 2022 21:27

Star (18) You must be signed in to star a gist
Fork (11) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/awjuliani/b5d83fcf3bf2898656be5730f098e08b.js"></script>
Save awjuliani/b5d83fcf3bf2898656be5730f098e08b to your computer and use it in GitHub Desktop.

Download ZIP

A Policy-Gradient algorithm that solves Contextual Bandit problems.

Raw

ContextualPolicy.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

muik commented Oct 12, 2016

A below line occurs an error. Tensorflow version on my local is 0.11.

output = slim.fully_connected(state_in_OH,a_size,\
            biases_initializer=None,activation_fn=tf.nn.sigmoid,weights_initializer=tf.ones)

Traceback (most recent call last):
  File "c_bendits.py", line 50, in <module>
    myAgent = agent(lr=0.001,s_size=cBandit.num_bandits,a_size=cBandit.num_actions) #Load the agent.
  File "c_bendits.py", line 34, in __init__
    biases_initializer=None,activation_fn=tf.nn.sigmoid,weights_initializer=tf.ones)
...
  File "/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 666, in <lambda>
    shape.as_list(), dtype=dtype, partition_info=partition_info)
TypeError: ones() got an unexpected keyword argument 'partition_info'

So the line should be below codes.

output = slim.fully_connected(state_in_OH,a_size,\
            biases_initializer=None,activation_fn=tf.nn.sigmoid,weights_initializer=tf.ones_initializer)

alphashuro commented Aug 27, 2017

These tutorials of yours are quite awesome and i am really loving them. in addition to the work you have done so far, can i suggest that since this tutorial material is designed to be read by beginners as part of a tutorial, would it not be more accommodating to use full variable names that represent the data it contains? for example, the first thing i struggled with was figuring out what lr meant in the parameters to the agent class's init, and the conciseness of the shortened name made it even more difficult to find where it is used in the code. I guessed that s_size and a_size were state size and action size, but i think it was an unnecessary barrier to understanding the actual content, as well as state_in_OH (i.e. the OH).

What do you think about this suggestion? i am hoping it can help others learn and understand the content better

alphashuro commented Aug 27, 2017

also, i didn't see anything on one hot encoding in the post, is it perhaps in one of your other posts?

easwar1977 commented Sep 28, 2017

This is a great demo. Can you also suggest how to I store the model as .h5 file, (like in Keras), and re-use it ?

Riotpiaole commented Oct 18, 2017

@alphashuro agreed. I struggle the same prob what is OH mean?

dargor commented Oct 18, 2017 •

edited

Loading

Hi, I'm learning RL with your articles, great work 👍

Here is a quick diff to use raw TF (as of 1.3) instead of slim :

-        state_in_OH = slim.one_hot_encoding(self.state_in, s_size)
-        output = slim.fully_connected(state_in_OH,
-                                      a_size,
-                                      biases_initializer=None,
-                                      activation_fn=tf.nn.sigmoid,
-                                      weights_initializer=ones)
+        state_in_OH = tf.one_hot(self.state_in, s_size)
+        output = tf.layers.dense(state_in_OH, a_size, tf.nn.sigmoid,
+                                 use_bias=False, kernel_initializer=ones)

@Riotpiaole OH = one hot [encoding]

lipixun commented Oct 30, 2017

According to my experiment (tensorflow 1.3), I suggest to use AdamOptimizer instead of GradientDescentOptimizer since GradientDescentOptimizer suffers from training stability issue.

lipixun commented Oct 30, 2017

@Riotpiaole I've re-implement the tutorial codes here, you may take a look at it.

pooriaPoorsarvi commented Mar 20, 2018

can anyone explain to me why we do not use softmax instead of sigmoid? and also why we don't use bias?(I tried both and it wouldn't work)

pooriaPoorsarvi commented Mar 20, 2018

@lipixun do you know the answer to my question? it would really help me thanks

JaeDukSeo commented Jun 29, 2018

@pooriaPoorsarvi as seen above we already got the responsible_weight variable, now we are getting the negative
Log likelihood to optimize for the maxium (tf only can optimize) no need to consider every other classes

araknadash commented Oct 17, 2018

Instead of using slim, can use tf as:
state_in_OH = tf.one_hot(self.state_in, s_size)
output = tf.layers.dense(state_in_OH, a_size, tf.nn.sigmoid, use_bias=False, kernel_initializer = tf.ones_initializer())

xkrishnam commented Jul 29, 2020 •

edited

Loading

Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone

daniel-xion commented Aug 10, 2022

Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone

Thanks for the implementation. I wonder how is the implementation a policy network? I don't see policy gradient is used.

awjuliani/ContextualPolicy.ipynb

muik commented Oct 12, 2016

Uh oh!

alphashuro commented Aug 27, 2017

Uh oh!

alphashuro commented Aug 27, 2017

Uh oh!

easwar1977 commented Sep 28, 2017

Uh oh!

Riotpiaole commented Oct 18, 2017

Uh oh!

dargor commented Oct 18, 2017 •

edited

Loading

Uh oh!

lipixun commented Oct 30, 2017

Uh oh!

lipixun commented Oct 30, 2017

Uh oh!

pooriaPoorsarvi commented Mar 20, 2018

Uh oh!

pooriaPoorsarvi commented Mar 20, 2018

Uh oh!

JaeDukSeo commented Jun 29, 2018

Uh oh!

araknadash commented Oct 17, 2018

Uh oh!

xkrishnam commented Jul 29, 2020 •

edited

Loading

Uh oh!

daniel-xion commented Aug 10, 2022

Uh oh!

awjuliani/ContextualPolicy.ipynb

muik commented Oct 12, 2016

Uh oh!

alphashuro commented Aug 27, 2017

Uh oh!

alphashuro commented Aug 27, 2017

Uh oh!

easwar1977 commented Sep 28, 2017

Uh oh!

Riotpiaole commented Oct 18, 2017

Uh oh!

dargor commented Oct 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lipixun commented Oct 30, 2017

Uh oh!

lipixun commented Oct 30, 2017

Uh oh!

pooriaPoorsarvi commented Mar 20, 2018

Uh oh!

pooriaPoorsarvi commented Mar 20, 2018

Uh oh!

JaeDukSeo commented Jun 29, 2018

Uh oh!

araknadash commented Oct 17, 2018

Uh oh!

xkrishnam commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daniel-xion commented Aug 10, 2022

Uh oh!

dargor commented Oct 18, 2017 •

edited

Loading

xkrishnam commented Jul 29, 2020 •

edited

Loading