Implementation of Double Dueling Deep-Q Network

Hi, First thanks so much for your detailed write ups and commented implementations. I have been working through them while developing my own RL environment outside of gym.

I have a few questions regarding the implementation for Double-DQN here:

The Double-DQN paper (https://arxiv.org/pdf/1511.06581.pdf) algorithm mentions updating \theta with each step t. It looks like the implementation here updates \theta every update_freq steps, and updates \theta- immediately afterwards. Is there something I don't understand? I guess it ends up being a heuristic decision when to perform these updates, just wondering what your intuition is for the \theta, \theta- update cycle.
Second is your nice tensorflow hack to update the targetQ weights. Does it rely on the order of initialization? Might there be a more verbose but explicit way to do it, maybe storing the targetQ ops by name in a dictionary?
Last is there a reason for not using a nonlinearity/activation in the network?

awjuliani/Double-Dueling-DQN-Tutorial.ipynb

nathanin commented Sep 8, 2017 •

edited

Loading

Uh oh!

samsenyang commented Dec 17, 2018

Uh oh!

awjuliani/Double-Dueling-DQN-Tutorial.ipynb

nathanin commented Sep 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samsenyang commented Dec 17, 2018

Uh oh!

nathanin commented Sep 8, 2017 •

edited

Loading