Skip to content

Instantly share code, notes, and snippets.

@elumixor
Created May 24, 2020 23:15
Show Gist options
  • Save elumixor/fcfa48df05d32816cb2134211e8ff39f to your computer and use it in GitHub Desktop.
Save elumixor/fcfa48df05d32816cb2134211e8ff39f to your computer and use it in GitHub Desktop.
def criterion(step):
# Apply parameters' update
apply_update(step)
with torch.no_grad():
distribution_new = actor(states)
distribution_new = torch.distributions.utils.clamp_probs(distribution_new)
probabilities_new = distribution_new[range(distribution_new.shape[0]), actions]
L_new = surrogate_loss(probabilities_new, probabilities, advantages)
KL_new = kl_div(distribution, distribution_new)
L_improvement = L_new - L
if L_improvement > 0 and KL_new <= delta:
return True
# Step size too big, reverse
apply_update(-step)
return False
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment