Skip to content

Instantly share code, notes, and snippets.

@gandroz
Created March 14, 2021 02:24
Show Gist options
  • Save gandroz/636cfb5db09bcf3d93b96eebb4a349e1 to your computer and use it in GitHub Desktop.
Save gandroz/636cfb5db09bcf3d93b96eebb4a349e1 to your computer and use it in GitHub Desktop.
Playing with the learnt Q-table
# Reset environment
state = env.reset()
# Render it
env.render()
time.sleep(0.5)
done = False
while not done:
# Choose the action with the max expected reward i.e. max Q-value
action = np.argmax(q_table[state])
# Try it !
state, reward, done, info = env.step(action)
# See the result
clear_output(wait=True)
env.render()
print(reward)
time.sleep(0.5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment