awjuliani/ContextualPolicy.ipynb

Last active October 11, 2022 21:27

Star (18) You must be signed in to star a gist
Fork (11) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/awjuliani/b5d83fcf3bf2898656be5730f098e08b.js"></script>
Save awjuliani/b5d83fcf3bf2898656be5730f098e08b to your computer and use it in GitHub Desktop.

Download ZIP

A Policy-Gradient algorithm that solves Contextual Bandit problems.

Raw

ContextualPolicy.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

araknadash commented Oct 17, 2018

Instead of using slim, can use tf as:
state_in_OH = tf.one_hot(self.state_in, s_size)
output = tf.layers.dense(state_in_OH, a_size, tf.nn.sigmoid, use_bias=False, kernel_initializer = tf.ones_initializer())

xkrishnam commented Jul 29, 2020 •

edited

Loading

Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone

daniel-xion commented Aug 10, 2022

Thanks Arthur! this is helpful tutorial for beginers like me. Here is tensorflow 2 implementation may be helpful for someone

Thanks for the implementation. I wonder how is the implementation a policy network? I don't see policy gradient is used.

awjuliani/ContextualPolicy.ipynb

Select an option

No results found

Select an option

No results found

araknadash commented Oct 17, 2018

Uh oh!

xkrishnam commented Jul 29, 2020 •

edited

Loading

Uh oh!

daniel-xion commented Aug 10, 2022

Uh oh!

awjuliani/ContextualPolicy.ipynb

araknadash commented Oct 17, 2018

Uh oh!

xkrishnam commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daniel-xion commented Aug 10, 2022

Uh oh!

xkrishnam commented Jul 29, 2020 •

edited

Loading