Skip to content

Instantly share code, notes, and snippets.

@hadifar
Created November 20, 2018 15:06
Show Gist options
  • Save hadifar/100253b2def4c9fd9f62cdf67944b6ef to your computer and use it in GitHub Desktop.
Save hadifar/100253b2def4c9fd9f62cdf67944b6ef to your computer and use it in GitHub Desktop.
# At begining of the training, we mostly create random action
# As we go forward, we choose actions based on our NeuralNetwork model
if random.random() < epsilon:
action = env.action_space.sample()
else:
action = model.predict(tf.constant(np.expand_dims(state, axis=0), dtype=tf.float32)).numpy()[0]
# Create next training data
next_state, reward, done, info = env.step(action)
# if we reach to top of mountain add reward
reward = -10. if done else reward
# replay_buffer contains all trainign data
replay_buffer.append((state, action, reward, next_state, 1 if done else 0))
state = next_state
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment