#Policy Resnet!
This is based on Andrej Karpathy's RL Tutorial:(http://karpathy.github.io/2016/05/31/rl/), but uses a residual neural net written in theano+lasagne to approximate the policy, and adam to optimize the parameters.
Similar architecture to the original resnet paper: arxiv.org/abs/1512.03385, but with no global pooling and a 512 unit fc layer after all the residual blocks.
- 4 residual blocks: 16 filters->32 filters->32 filters->64 filters
- all filters 3x3.
- where number of filters increased, I used stride 2x2 to decrease height/width
- relu nonlinearity