Skip to content

Instantly share code, notes, and snippets.

@elumixor
Last active May 25, 2020 12:29
Show Gist options
  • Save elumixor/738dc4b1491e51fbdd8d03ef6e14de35 to your computer and use it in GitHub Desktop.
Save elumixor/738dc4b1491e51fbdd8d03ef6e14de35 to your computer and use it in GitHub Desktop.
def estimate_advantages(states, last_state, rewards):
values = critic(states)
last_value = critic(last_state.unsqueeze(0))
next_values = torch.zeros_like(rewards)
for i in reversed(range(rewards.shape[0])):
last_value = next_values[i] = rewards[i] + 0.99 * last_value
advantages = next_values - values
return advantages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment