Skip to content

Instantly share code, notes, and snippets.

@manuel-delverme
Last active June 11, 2018 14:31
Show Gist options
  • Save manuel-delverme/36f9fd220989903274c4badf83c0f880 to your computer and use it in GitHub Desktop.
Save manuel-delverme/36f9fd220989903274c4badf83c0f880 to your computer and use it in GitHub Desktop.
zero code
environment = environments.GoEnvironment(board_size=19)
player_mcts = mcts.MCTS(
environment,
networks.NeuralNetwork(board_size=environment.getStateSize(), action_size=environment.getActionSize()),
)
training_samples = collections.deque(maxlen=opt.training_samples_buffer_size)
for iteration_number in range(opt.num_iters):
for eps in range(opt.number_episodes):
player_mcts.reset()
new_experiences = play_against_yourself(environment, player_mcts, nr_games=1)
training_samples.extend(new_experiences)
opponent_mcts = mcts.MCTS(environment, copy.deepcopy(player_neural_network))
player_neural_network.train(training_samples)
player_wins, player_losses, draws = arena.fight(environment, player_mcts, opponent_mcts)
win_ratio = float(player_wins) / (player_wins + player_losses)
if win_ratio < opt.update_threshold:
player_neural_network = opponent_network
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment