Skip to content

Instantly share code, notes, and snippets.

@davidADSP
Created December 1, 2019 17:36
Show Gist options
  • Save davidADSP/4763a7f0ddd644cff1c3beecc83d27a3 to your computer and use it in GitHub Desktop.
Save davidADSP/4763a7f0ddd644cff1c3beecc83d27a3 to your computer and use it in GitHub Desktop.
# We expand a node using the value, reward and policy prediction obtained from
# the neural network.
def expand_node(node: Node, to_play: Player, actions: List[Action],
network_output: NetworkOutput):
node.to_play = to_play
node.hidden_state = network_output.hidden_state
node.reward = network_output.reward
policy = {a: math.exp(network_output.policy_logits[a]) for a in actions}
policy_sum = sum(policy.values())
for action, p in policy.items():
node.children[action] = Node(p / policy_sum)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment