Skip to content

Instantly share code, notes, and snippets.

View avalcarce's full-sized avatar

Alvaro avalcarce

  • Nokia Bell-Labs
  • Paris
View GitHub Profile
@avalcarce
avalcarce / README.md
Last active March 27, 2018 14:53
Solving MountainCar-v0

Synopsis

This is a Deep Reinforcement Learning solution to some classic control problems. I've used it to solve MountainCar-v0 problem, CartPole-v0 and [CartPole-v1] (https://gym.openai.com/envs/CartPole-v1) in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

Some of the hyperparameters used in the main.py script to solve MountainCar-v0 have been optained partly through exhaustive search, and partly via Bayesian optimization with Scikit-Optimize. The optimized hyperparameters and their values are:

  • Size of 1st fully connected layer: 198
  • Size of 2nd fully connected layer: 96
  • Learning rate: 2.33E-4
  • Period (in steps) for the update of the target network parameters as per the DQN algorithm: 999
@avalcarce
avalcarce / README.md
Last active September 24, 2017 17:11
RL DQN solution for MountainCar-v0, CartPole-v0 and CartPole-v1 on OpenAI's Gym

Synopsis

This is a Deep Reinforcement Learning solution to some classic control problems. I've used it to solve MountainCar-v0 problem, CartPole-v0 and [CartPole-v1] (https://gym.openai.com/envs/CartPole-v1) in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. The code is fundamentally a translation of necnec's algorithm with Theano & Lasagne to Tensorflow. I've run it on Python 3.5 under Windows 7.

References

  1. Deep Learning tutorial, David Silver, Google DeepMind.
  2. necnec's algorithm