Last active
July 31, 2017 06:41
-
-
Save filmo/cb39d725050d9fff81544f26d4d94c36 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Used a 2-layer Fully Connected network with H1=100, H2=60 and ReLU | |
He Initialization of weights | |
Adam Optimizer. Initial learning rate = 0.001 Learning rate reduced using gamma of 0.50 every 350 episodes | |
Gamma = 0.99 | |
Eps = 1.00 | |
Eps Decay = 0.98 | |
Eps Decay every new episode. (not each step) | |
Uniform sampling from a replay buffer with 150,000 memories. No learning for first 1500 steps. (to fill replay buffer). | |
I tried to implement prioritized experience replay, but couldn't get it to work. (yet) | |
States normalized based on emperical mean and std of historical observation. | |
No reward or gradient clipping. | |
Target Network updated every 600 steps | |
Smooth L1 Loss (Huber Loss) rather than MSE. | |
Implemented in pyTorch and python 3.5. Trained on GTX-1070 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment