Skip to content

Instantly share code, notes, and snippets.

@joyhuang9473
Last active June 18, 2018 07:20
Show Gist options
  • Save joyhuang9473/e8e32540f35b7693bbf97403f3e677ec to your computer and use it in GitHub Desktop.
Save joyhuang9473/e8e32540f35b7693bbf97403f3e677ec to your computer and use it in GitHub Desktop.
dqn-trianing-0615214336
config = {
'network': [
('input', {}),
('conv1', {'W_size': 8, 'stride': 4, 'in': 4, 'out': 16}),
('conv2', {'W_size': 4, 'stride': 2, 'in': 16, 'out': 32}),
('fc1', {'num_relus': 256}),
('output', {}),
],
'input_size': [84, 84], # height, width
'num_actions': 4,
'var_init_mean': 0.0,
'var_init_stddev': 0.01,
'minibatch_size': 32,
'replay_memory_size': 10 ** 6,
'agent_history_length': 4,
'discount_factor': 0.95,
'learning_rate': 0.00025,
'rms_prop_decay': 0.95,
'gradient_momentum': 0.0,
'min_squared_gradient': 0.01,
'final_exploration': 0.1,
'final_exploration_frame': 10 ** 6,
'replay_start_size': 5 * (10 ** 4),
'validation_size': 500,
'evaluation_exploration': 0.05,
}
2018-06-18 12:12:18,471 - __main__ - INFO - Load model: checkpoints/0615214336/breakout-v4-7500000
2018-06-18 12:12:42,881 - __main__ - INFO - episode: 1, reward: 11, ave. reward: 11,
2018-06-18 12:13:39,545 - __main__ - INFO - episode: 2, reward: 11, ave. reward: 11,
2018-06-18 12:14:16,055 - __main__ - INFO - episode: 3, reward: 11, ave. reward: 11,
2018-06-18 12:15:12,720 - __main__ - INFO - episode: 4, reward: 11, ave. reward: 11,
2018-06-18 12:16:03,299 - __main__ - INFO - episode: 5, reward: 11, ave. reward: 11,
2018-06-18 12:17:04,777 - __main__ - INFO - episode: 6, reward: 11, ave. reward: 11,
2018-06-18 12:17:55,533 - __main__ - INFO - episode: 7, reward: 9, ave. reward: 10.7143,
2018-06-18 12:18:24,553 - __main__ - INFO - episode: 8, reward: 11, ave. reward: 10.75,
2018-06-18 12:19:00,727 - __main__ - INFO - episode: 9, reward: 11, ave. reward: 10.7778,
2018-06-18 12:19:32,899 - __main__ - INFO - episode: 10, reward: 11, ave. reward: 10.8,
2018-06-18 12:19:59,240 - __main__ - INFO - Finished: the best reward 11, the ave. reward 10.8.
iter_1500000 the best reward 2, the ave. reward 0.2.
iter_2000000 the best reward 1, the ave. reward 0.1.
iter_4500000 the best reward 11, the ave. reward 10.7.
iter_5500000 the best reward 11, the ave. reward 10.2.
iter_7000000 the best reward 11, the ave. reward 10.
iter_7500000 the best reward 11, the ave. reward 10.8.
iter_10500000 the best reward 11, the ave. reward 10.7.
iter_12500000 the best reward 3, the ave. reward 2.1.
iter_13000000 the best reward 14, the ave. reward 8.2.
iter_13500000 the best reward 5, the ave. reward 4.2.
iter_14000000 the best reward 3, the ave. reward 3.
iter_14500000 the best reward 7, the ave. reward 3.4.
iter_15000000 the best reward 9, the ave. reward 7.1.
@joyhuang9473
Copy link
Author

train_op_loss

train_op_loss

@joyhuang9473
Copy link
Author

validation_op_average_Q

validation_op_average_q

@joyhuang9473
Copy link
Author

validation_op_reward_per_episode

validation_op_reward_per_episode

@joyhuang9473
Copy link
Author

iter_7500000_play_frame_output.gif

iter_7500000_play_frame_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment