ppwwyyxx/tensorpack-gym.md

Last active May 23, 2018 09:29

Star (1) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ppwwyyxx/713a873a50ef83712e2909fb835a1fb8.js"></script>
Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.

placeholder for OpenAI Gym submission

Raw

Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo

Gist doesn't have notifications, please use repo issues to discuss.

dylanthomas commented Nov 28, 2016

Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?

First,
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?

Second,
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?

Author

ppwwyyxx commented Nov 28, 2016 •

edited

Loading

The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.

Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.

dylanthomas commented Nov 28, 2016

That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?

Author

ppwwyyxx commented Nov 28, 2016

Yes, apart from other minor differences, random frame skip might be most relevant to performance.

dylanthomas commented Nov 28, 2016

The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?

Author

ppwwyyxx commented Nov 28, 2016

Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.

dylanthomas commented Nov 30, 2016

Wonderful. Thank YOU !

Nhorning commented Dec 11, 2016

Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?

lululun20 commented Mar 16, 2017

Hey,
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!

Author

ppwwyyxx commented Mar 16, 2017

It has better performance on GPU.

cyrsis commented Dec 1, 2017

[1201 10:47:55 @monitor.py:363] max_score: 863
[1201 10:47:55 @monitor.py:363] mean_score: 590.14

This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram

it still running,

when I do

./train-atari.py --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir

It said

AssertionError: Breakout-v0.npy"

Do I need to wait for the training finish to get Breakout-v0.npy ????

pablosjb commented May 23, 2018

Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:

I am trying to solve the game "tennis-v0" in which the data (observation) is the image (RGB 3D-array) and I want first to extract features such as players position, ball position and score.
For the score, i am thinking about applying a text recognition algortihm in the region where the score is.
The problem is for the location of the items (players and ball), Can anyone help me telling which way to take?

Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.

ppwwyyxx/tensorpack-gym.md

dylanthomas commented Nov 28, 2016

Uh oh!

ppwwyyxx commented Nov 28, 2016 •

edited

Loading

Uh oh!

dylanthomas commented Nov 28, 2016

Uh oh!

ppwwyyxx commented Nov 28, 2016

Uh oh!

dylanthomas commented Nov 28, 2016

Uh oh!

ppwwyyxx commented Nov 28, 2016

Uh oh!

dylanthomas commented Nov 30, 2016

Uh oh!

Nhorning commented Dec 11, 2016

Uh oh!

lululun20 commented Mar 16, 2017

Uh oh!

ppwwyyxx commented Mar 16, 2017

Uh oh!

cyrsis commented Dec 1, 2017

Uh oh!

pablosjb commented May 23, 2018

Uh oh!

ppwwyyxx/tensorpack-gym.md

dylanthomas commented Nov 28, 2016

Uh oh!

ppwwyyxx commented Nov 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dylanthomas commented Nov 28, 2016

Uh oh!

ppwwyyxx commented Nov 28, 2016

Uh oh!

dylanthomas commented Nov 28, 2016

Uh oh!

ppwwyyxx commented Nov 28, 2016

Uh oh!

dylanthomas commented Nov 30, 2016

Uh oh!

Nhorning commented Dec 11, 2016

Uh oh!

lululun20 commented Mar 16, 2017

Uh oh!

ppwwyyxx commented Mar 16, 2017

Uh oh!

cyrsis commented Dec 1, 2017

Uh oh!

pablosjb commented May 23, 2018

Uh oh!

ppwwyyxx commented Nov 28, 2016 •

edited

Loading