Soren Bouma sorenbouma

#Policy Resnet!

This is based on Andrej Karpathy's RL Tutorial:(http://karpathy.github.io/2016/05/31/rl/), but uses a residual neural net written in theano+lasagne to approximate the policy, and adam to optimize the parameters.

Similar architecture to the original resnet paper: arxiv.org/abs/1512.03385, but with no global pooling and a 512 unit fc layer after all the residual blocks.

4 residual blocks: 16 filters->32 filters->32 filters->64 filters
all filters 3x3.
where number of filters increased, I used stride 2x2 to decrease height/width
relu nonlinearity

	import gym
	import numpy as np
	import matplotlib.pyplot as plt
	env = gym.make('CartPole-v0')
	env.render(close=True)
	#vector of means(mu) and standard dev(sigma) for each paramater
	mu=np.random.uniform(size=state.shape)
	sigma=np.random.uniform(low=0.001,size=state.shape)