Created
November 20, 2018 15:06
-
-
Save hadifar/100253b2def4c9fd9f62cdf67944b6ef to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# At begining of the training, we mostly create random action | |
# As we go forward, we choose actions based on our NeuralNetwork model | |
if random.random() < epsilon: | |
action = env.action_space.sample() | |
else: | |
action = model.predict(tf.constant(np.expand_dims(state, axis=0), dtype=tf.float32)).numpy()[0] | |
# Create next training data | |
next_state, reward, done, info = env.step(action) | |
# if we reach to top of mountain add reward | |
reward = -10. if done else reward | |
# replay_buffer contains all trainign data | |
replay_buffer.append((state, action, reward, next_state, 1 if done else 0)) | |
state = next_state |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment