Xiaohu Zhu tigerneil

Train

python qlearn.py

Test

Generate figures

Prioritized Experience Replay

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 42 out of 57 games.

Authors: Tom Schaul [email protected], John Quan [email protected], Ioannis Antonoglou [email protected], David Silver [email protected]

Recurrent Reinforcement Learning: A Hybrid Ap

	"""Information Retrieval metrics

	Useful Resources:
	http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
	http://www.nii.ac.jp/TechReports/05-014E.pdf
	http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
	http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
	Learning to Rank for Information Retrieval (Tie-Yan Liu)
	"""
	import numpy as np

	"""
	This is a batched LSTM forward and backward pass
	"""
	import numpy as np
	import code

	class LSTM:

	@staticmethod
	def init(input_size, hidden_size, fancy_forget_bias_init = 3):

	"""
	Simple implementation of Identity Recurrent Neural Networks (IRNN)

	Reference
	A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
	http://arxiv.org/abs/1504.00941

	"""

	import numpy as np

	import theano
	from pylearn2.models import mlp
	from pylearn2.training_algorithms import sgd
	from pylearn2.termination_criteria import EpochCounter
	from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
	import numpy as np
	from random import randint


	class XOR(DenseDesignMatrix):

	"""
	This is a batched LSTM forward and backward pass
	"""
	import numpy as np
	import code

	class LSTM:

	@staticmethod
	def init(input_size, hidden_size, fancy_forget_bias_init = 3):