Skip to content

Instantly share code, notes, and snippets.

@sorenbouma
Last active August 3, 2016 22:30
Show Gist options
  • Save sorenbouma/f78adb63570d3da93f7dba6676f97517 to your computer and use it in GitHub Desktop.
Save sorenbouma/f78adb63570d3da93f7dba6676f97517 to your computer and use it in GitHub Desktop.

#Policy Resnet!

This is based on Andrej Karpathy's RL Tutorial:(http://karpathy.github.io/2016/05/31/rl/), but uses a residual neural net written in theano+lasagne to approximate the policy, and adam to optimize the parameters.

Similar architecture to the original resnet paper: arxiv.org/abs/1512.03385, but with no global pooling and a 512 unit fc layer after all the residual blocks.

  • 4 residual blocks: 16 filters->32 filters->32 filters->64 filters
  • all filters 3x3.
  • where number of filters increased, I used stride 2x2 to decrease height/width
  • relu nonlinearity

Will post the learning curves soon. One thing I noticed was that compared to other convnets, the resnets started consistently increasing its mean reward MUCH earlier, and continued to learn faster until convergence, which was at around 8000 episodes.

Code at https://github.com/sjb373/Policy-Convnets - it's a bit of a mess, working on cleaning it up. Dependencies: theano, lasagne, matplotlib, dill, numpy, gym

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment