-
-
Save awjuliani/fffe41519166ee41a6bd5f5ce8ae2630 to your computer and use it in GitHub Desktop.
Hi, First thanks so much for your detailed write ups and commented implementations. I have been working through them while developing my own RL environment outside of gym
.
I have a few questions regarding the implementation for Double-DQN here:
-
The Double-DQN paper (https://arxiv.org/pdf/1511.06581.pdf) algorithm mentions updating \theta with each step t. It looks like the implementation here updates \theta every
update_freq
steps, and updates \theta- immediately afterwards. Is there something I don't understand? I guess it ends up being a heuristic decision when to perform these updates, just wondering what your intuition is for the \theta, \theta- update cycle. -
Second is your nice tensorflow hack to update the targetQ weights. Does it rely on the order of initialization? Might there be a more verbose but explicit way to do it, maybe storing the targetQ ops by name in a dictionary?
-
Last is there a reason for not using a nonlinearity/activation in the network?
I would like to ask a question: do we have to split the inputs in order to achieve dueling DQN?
why can't i just input all the inputs into value layer and advantage layer?
@mphielipp Replace that line with:
self.AW = tf.Variable(tf.random_normal([h_size // 2, env.actions]))
It expects an integer, not a float.