Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo
Gist doesn't have notifications, please use repo issues to discuss.
Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo
Gist doesn't have notifications, please use repo issues to discuss.
The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.
Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.
That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?
Yes, apart from other minor differences, random frame skip might be most relevant to performance.
The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?
Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.
Wonderful. Thank YOU !
Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?
Hey,
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!
It has better performance on GPU.
[1201 10:47:55 @monitor.py:363] max_score: 863
[1201 10:47:55 @monitor.py:363] mean_score: 590.14
This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram
it still running,
when I do
./train-atari.py --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir
It said
AssertionError: Breakout-v0.npy"
Do I need to wait for the training finish to get Breakout-v0.npy ????
Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:
Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.
Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?
First,
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?
Second,
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?