Skip to content

Instantly share code, notes, and snippets.

View HansBouwmeester's full-sized avatar

Hans Bouwmeester HansBouwmeester

  • Los Gatos, CA
View GitHub Profile
@HansBouwmeester
HansBouwmeester / ddpg_gym.py
Created May 22, 2017 22:07 — forked from Anjum48/ddpg_gym.py
Pendulum-v0 submission using DDPG without batch normalisation
"""
Implementation of DDPG - Deep Deterministic Policy Gradient
Algorithm and hyperparameter details can be found here: http://arxiv.org/pdf/1509.02971v2.pdf
Variance scaling paper: https://arxiv.org/pdf/1502.01852v1.pdf
Thanks to GitHub users yanpanlau, pemami4911, songrotek and JunhongXu for their DDPG examples
Batch normalisation on the actor accelerates learning but has poor long term stability. Applying to the critic breaks
it, particularly on the state branch. Not sure why but I think this issue is specific to this environment
"""
import numpy as np
@HansBouwmeester
HansBouwmeester / Q-Table Learning-Clean.ipynb
Created May 1, 2017 00:28 — forked from awjuliani/Q-Table Learning-Clean.ipynb
Q-Table learning in OpenAI grid world.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.