avalcarce’s gists

Synopsis

This is a Deep Reinforcement Learning solution to some classic control problems. I've used it to solve MountainCar-v0 problem, CartPole-v0 and [CartPole-v1] (https://gym.openai.com/envs/CartPole-v1) in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

Some of the hyperparameters used in the main.py script to solve MountainCar-v0 have been optained partly through exhaustive search, and partly via Bayesian optimization with Scikit-Optimize. The optimized hyperparameters and their values are:

Size of 1st fully connected layer: 198
Size of 2nd fully connected layer: 96
Learning rate: 2.33E-4
Period (in steps) for the update of the target network parameters as per the DQN algorithm: 999

Alvaro avalcarce

Synopsis

Synopsis

References