Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo
Gist doesn't have notifications, please use repo issues to discuss.
Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo
Gist doesn't have notifications, please use repo issues to discuss.
Thank you for your reply. My colleague ran successfully with your sample with the same environment. I'll tried again later.
Hi @ppwwyyxx,
I tried to run a pretrained Atari model from examples/OpenAIGym and got an error about a "malformed environment ID". The full traceback is copied below. Do you have any suggestions on how to avoid this issue?
I would really appreciate any suggestions.
ENV=Breakout-v0 ./ --load "$ENV".tfmodel --env "$ENV"
[2016-10-11 19:41:53,285] Making new env:
Traceback (most recent call last):
File "./", line 87, in
p = get_player(); del p # set NUM_ACTIONS
File "./", line 28, in get_player
pl = GymEnv(ENV_NAME, dumpdir=dumpdir, auto_restart=False)
File "/home/user/tensorpack/tensorpack/RL/", line 30, in init
self.gymenv = gym.make(name)
File "/home/user/gym/gym/envs/", line 126, in make
return registry.make(id)
File "/home/user/gym/gym/envs/", line 90, in make
spec = self.spec(id)
File "/home/user/gym/gym/envs/", line 99, in spec
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
gym.error.Error: Attempted to look up malformed environment ID: . (Currently all IDs must be of the form ^([\w:-]+)-v(\d+)$.)
For some reason I never got notified about the discussions here.
ENV should be an environment variable, so it should be (note the semicolon)
ENV=Breakout-v0; ./ --load "$ENV".tfmodel --env "$ENV"
I'll correct this in the readme.
// OK it looks like gist doesn't have notification at all: issue
// Further visitors please use issues in my code repo so I can see you..
Hello! I have been studying the Tutankham using your code. Could you tell me how you plot the "training curve on break out"? since I hope to plot a similar figure on Tutankham, in order to monitor the training process. Thanks!
After you started training all the statistics will be in train_log/some_dir/stat.json
You can parse the json and plot it using your tools, or open the directory with tensorboard, or plot it with my plotting tools:
cat train_log/some_directory/stat.json | jq '.[] | .mean_score // empty' | scripts/
I am trying to train your A3C from scratch, but got the following error. Can you guide me to the right direction ?
Million thanks in advance,
(py35) ➜ OpenAIGym git:(master) ✗ ./ --env Breakout-v0 --gpu 0
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
[2016-11-25 11:06:13,976] Making new env: Breakout-v0
Traceback (most recent call last):
File "./", line 247, in
train_tower = range(nr_gpu)[:-nr_gpu/2] or [0]
TypeError: slice indices must be integers or None or have an index method
@dylanthomas Sorry, that's a python-3 compatibility problem. You need to replace nr_gpu/2
by nr_gpu//2
. I just fixed it in the project.
Thank you, but now I am getting
Traceback (most recent call last):
File "./", line 255, in
config = get_config()
File "./", line 184, in get_config
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "./", line 184, in
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "/home/john/dev/tensorpack/tensorpack/RL/", line 70, in init
super(SimulatorProcessStateExchange, self).init(idx)
File "/home/john/dev/tensorpack/tensorpack/RL/", line 52, in init = self.identity = u'simulator-{}'.format(self.idx).encode('utf-8')
File "/home/john/anaconda3/envs/py35/lib/python3.5/multiprocessing/", line 143, in name
assert isinstance(name, str), 'name must be a string'
AssertionError: name must be a string
Another compatibility problem, maybe ?
Yes.. it is an unicode/str compatibility issue.. I just pushed another fix. I don't have a python3 environment for testing now, but hopefully it'll work..
It works !!! Many thanks !!
Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?
The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.
Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.
That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?
Yes, apart from other minor differences, random frame skip might be most relevant to performance.
The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?
Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.
Wonderful. Thank YOU !
Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!
It has better performance on GPU.
[1201 10:47:55] max_score: 863
[1201 10:47:55] mean_score: 590.14
This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram
it still running,
when I do
./ --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir
It said
AssertionError: Breakout-v0.npy"
Do I need to wait for the training finish to get Breakout-v0.npy ????
Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:
Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.
I ran
python2 ./ --rom breakout.bin --gpu 0
today for about 8 hours. At global_step=360000 it already reached a score of 40. This is roughly what I had before, so it's unlikely to be some bugs I introduced recently.Did you modify the code some way?
Also someone had issue with GTX1080 + cuda8.0 before. tensorflow/tensorflow#3068, tensorpack/tensorpack#8. Maybe it's related.