Hans Bouwmeester HansBouwmeester

HansBouwmeester / ddpg_gym.py

Created May 22, 2017 22:07 — forked from Anjum48/ddpg_gym.py

Pendulum-v0 submission using DDPG without batch normalisation

	"""
	Implementation of DDPG - Deep Deterministic Policy Gradient
	Algorithm and hyperparameter details can be found here: http://arxiv.org/pdf/1509.02971v2.pdf
	Variance scaling paper: https://arxiv.org/pdf/1502.01852v1.pdf
	Thanks to GitHub users yanpanlau, pemami4911, songrotek and JunhongXu for their DDPG examples

	Batch normalisation on the actor accelerates learning but has poor long term stability. Applying to the critic breaks
	it, particularly on the state branch. Not sure why but I think this issue is specific to this environment
	"""
	import numpy as np

HansBouwmeester / Q-Table Learning-Clean.ipynb

Created May 1, 2017 00:28 — forked from awjuliani/Q-Table Learning-Clean.ipynb

Q-Table learning in OpenAI grid world.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.