- Deep Q network(이하 DQN)와 그 Variants(DoubleDQN, DuelingDQN)를 구현해보았다.
-
그런데 DQN과 같은 계열의 방법은 RL에서 Value-Based Method라서 Policy에 대한 직접적인 학습이 아니다.
-
게다가 Value가 약간만 바뀌어도 Policy가 금방 변한다.
-
학습과정이 불안정하게되어 수렴자체가 불안정해진다. (Bias가 높다.)
# from : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/graph_editor/examples/edit_graph_example.py | |
import numpy as np | |
import tensorflow as tf | |
from tensorflow.contrib import graph_editor as ge | |
# create a graph | |
g = tf.Graph() | |
with g.as_default(): |
## Convex Optimization in R using CVXR | |
## from : https://rviews.rstudio.com/2017/11/27/introduction-to-cvxr/ | |
## load CVXR | |
if (!require(CVXR)){ | |
install.packages("CVXR") | |
} else{ | |
require(CVXR) | |
} |
# 2) RNN model 생성---------------- | |
## Refer : https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-12-5-rnn_stock_prediction.py | |
library(tensorflow) | |
library(reticulate) | |
contrib <- tf$contrib | |
tf$reset_default_graph() | |
# train Parameters |
# Thanks for 주찬웅 | |
# References: | |
## 1) https://github.com/jcwleo/Reinforcement_Learning/blob/master/Breakout/ | |
## 2) http://pytorch.org/tutorials/ | |
## 3) https://github.com/transedward/pytorch-dqn | |
## My codes is very very dirty... | |
## I want to your idea and advice that improves this codes. | |
import argparse |