Skip to content

Instantly share code, notes, and snippets.

@ravishchawla
Last active September 27, 2019 13:57
Show Gist options
  • Save ravishchawla/5837eeffa7a803d57a0d38efe19f9d23 to your computer and use it in GitHub Desktop.
Save ravishchawla/5837eeffa7a803d57a0d38efe19f9d23 to your computer and use it in GitHub Desktop.
Dueling Q Network training
state, dqn_agent = env.reset(train_mode=True)[brain_name].vector_observations[0], Agent(state_size, action_size, 1024);
scores, discount = [], EPS;
for ite in range(1, num_iterations+1):
score, env_info = 0, env.reset(train_mode=True)[brain_name];
state = env_info.vector_observations[0];
for t_step in range(max_timesteps):
action = dqn_agent.act(state, discount);
env_info = env.step(action)[brain_name];
next_state = env_info.vector_observations[0];
reward, done = env_info.rewards[0], env_info.local_done[0];
dqn_agent.step(state, action, reward, next_state, done);
score, state = score + reward, next_state;
if done:
break;
scores.append(score);
discount = max(EPS_LIMIT, EPS_DECAY * discount);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment