Skip to content

Instantly share code, notes, and snippets.

View avalcarce's full-sized avatar

Alvaro avalcarce

  • Nokia Bell-Labs
  • Paris
View GitHub Profile
@avalcarce
avalcarce / README.md
Created March 8, 2017 09:27
Solving Acrobot-v1 with Double DQN and Prioritized Experience Replay (with proportional prioritization)

Synopsis

This is a Deep Reinforcement Learning solution to the Acrobot-v1 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Double Deep Q Network (DQN) with Prioritized Experience Replay (PER), where the proportional prioritization variant has been implemented. All hyper parameters have been chosen by hand based on several experiments. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are:

@avalcarce
avalcarce / README.md
Created March 8, 2017 11:53
Solving Acrobot-v1 with DQN and Prioritized Experience Replay (with proportional prioritization)

Synopsis

This is a Deep Reinforcement Learning solution to the Acrobot-v1 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Deep Q Network (DQN) with Prioritized Experience Replay (PER), where the proportional prioritization variant has been implemented. All hyper parameters have been chosen by hand based on several experiments. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are: