xkrishnam/contextualPolicy-n-arm-bandit.ipynb

Last active September 29, 2022 06:17

Star (3) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/xkrishnam/d9a62d52d28eb943c3965c6cf631ad30.js"></script>
Save xkrishnam/d9a62d52d28eb943c3965c6cf631ad30 to your computer and use it in GitHub Desktop.

Download ZIP

tensorflow 2 implementation of Policy gradient method for solving n-armed bandit problems.

Raw

contextualPolicy-n-arm-bandit.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Author

xkrishnam commented Aug 22, 2022

Context here simply means that algorithm also considers information about the state of environment (context) to generate actions for getting higher rewards (i.e. not only generating random actions and optimizing loss).

2nd you can add more layers but output from last layer should be number_of_bandicts * number_of_possible_actions that means you can put layers before current first layer (i.e. layer1).

Or you can make code more generic to use it more effectively as when I coded it my only intention was to covert existing TF1 solution using TF2.

xkrishnam/contextualPolicy-n-arm-bandit.ipynb

xkrishnam commented Aug 22, 2022

Uh oh!