#Policy Resnet!
This is based on Andrej Karpathy's RL Tutorial:(http://karpathy.github.io/2016/05/31/rl/), but uses a residual neural net written in theano+lasagne to approximate the policy, and adam to optimize the parameters.
Similar architecture to the original resnet paper: arxiv.org/abs/1512.03385, but with no global pooling and a 512 unit fc layer after all the residual blocks.
- 4 residual blocks: 16 filters->32 filters->32 filters->64 filters
- all filters 3x3.
- where number of filters increased, I used stride 2x2 to decrease height/width
- relu nonlinearity
Will post the learning curves soon. One thing I noticed was that compared to other convnets, the resnets started consistently increasing its mean reward MUCH earlier, and continued to learn faster until convergence, which was at around 8000 episodes.
Code at https://github.com/sjb373/Policy-Convnets - it's a bit of a mess, working on cleaning it up. Dependencies: theano, lasagne, matplotlib, dill, numpy, gym