ugo-nama-kun/visual_rl.md

Last active October 3, 2021 11:34

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ugo-nama-kun/586f6d7683a2b79d6c42fba0277d8fb3.js"></script>
Save ugo-nama-kun/586f6d7683a2b79d6c42fba0277d8fb3 to your computer and use it in GitHub Desktop.

連続行動＋視覚入力を使った深層強化学習まとめ

Raw

Deep RL + Continuous Control with Vision

Author

ELBOで定式化しているところは（たぶん）きれいだと思う
decoder p(x|z) は 5 transposed convolutional layers (256 4 × 4, 128 3 × 3, 64 3 × 3, 32 3 × 3, and 3 5 × 5 filters, respectively, stride 2 each, except for the first layer)
q(z|x) 5 convolutional layers (32 5 × 5, 64 3 × 3, 128 3 × 3, 256 3 × 3, and 256 4 × 4 filters, respectively, stride 2 each, except for the last layer)
q(z'|x,z,a)は 2 fully connected layers (256 units each), and a Gaussian output layer
latent variable は z_1=32dim, z_2=256 dim
critic は 256-256-value
actor は 5 convnet layer-256-256-tanh?-action: ここでもCNNの更新は他のobjectiveで更新されていて、actor はそのCNNを利用しているだけ