Last active
July 14, 2019 16:24
-
-
Save awjuliani/fffe41519166ee41a6bd5f5ce8ae2630 to your computer and use it in GitHub Desktop.
Implementation of Double Dueling Deep-Q Network
I would like to ask a question: do we have to split the inputs in order to achieve dueling DQN?
why can't i just input all the inputs into value layer and advantage layer?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, First thanks so much for your detailed write ups and commented implementations. I have been working through them while developing my own RL environment outside of
gym
.I have a few questions regarding the implementation for Double-DQN here:
The Double-DQN paper (https://arxiv.org/pdf/1511.06581.pdf) algorithm mentions updating \theta with each step t. It looks like the implementation here updates \theta every
update_freq
steps, and updates \theta- immediately afterwards. Is there something I don't understand? I guess it ends up being a heuristic decision when to perform these updates, just wondering what your intuition is for the \theta, \theta- update cycle.Second is your nice tensorflow hack to update the targetQ weights. Does it rely on the order of initialization? Might there be a more verbose but explicit way to do it, maybe storing the targetQ ops by name in a dictionary?
Last is there a reason for not using a nonlinearity/activation in the network?