Skip to content

Instantly share code, notes, and snippets.

@elumixor
Last active May 24, 2020 23:02
Show Gist options
  • Save elumixor/472a7b54e08208f7db7a03d169639d78 to your computer and use it in GitHub Desktop.
Save elumixor/472a7b54e08208f7db7a03d169639d78 to your computer and use it in GitHub Desktop.
TRPO update 2
# We will calculate the gradient wrt to the new probabilities (surrogate function),
# so second probabilities should be treated as a constant
L = surrogate_loss(probabilities, probabilities.detach(), advantages)
KL = kl_div(distribution, distribution)
parameters = list(actor.parameters())
# Retain, because we will use the graph several times
g = flat_grad(L, parameters, retain_graph=True)
# Create graph, because we will call backward() on the graph itself (for hessian-vector product)
d_kl = flat_grad(KL, parameters, create_graph=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment