John Schulman joschu

Code used to obtain these results can be found at the url https://github.com/joschu/modular_rl, commit 50cdfdf375e69d86e3db6eb2ad0218ea6aebf371. The command line expression used for all the environments can be found in the text file below. Note that the same exact parameters and policies were used for all tasks, except for timesteps_per_batch, which was varied based on the difficulty of the task. The important parameters are:

gamma=0.995: discount
lam=0.97: see GAE paper for explanation
agent=TrpoAgent: name of the class, which specifies policy and value function architecture. In this case, we used two hidden layers of size 64, with tanh activations
cg_damping: multiple of the identity added for conjugate gradient

I used the cross-entropy method (an evolutionary algorithm / derivative free optimization method) to optimize small two-layer neural networks.

Code used to obtain these results can be found at the url https://github.com/joschu/modular_rl, commit 3324639f82a81288e9d21ddcb6c2a37957cdd361. The command line expression used for all the environments can be found in the text file below. Note that the same exact parameters were used for all tasks. The important parameters are:

hid_sizes=10,5: hidden layer sizes of MLP
extra_std=0.01: noise added to variance, see [1]

This is a tiny update to https://gist.github.com/joschu/a21ed1259d3f8c7bdff178fb47bc6fc1#file-1-cem-v0-writeup-md

I ran experiments on the v1 mujoco environments
I reduced the added noise extra_std parameter from 0.01 to 0.001

I used the cross-entropy method (an evolutionary algorithm / derivative free optimization method) to optimize small two-layer neural networks.

Code used to obtain these results can be found at the url https://github.com/joschu/modular_rl, commit ba42955b41d7f419470a95d875af1ab7e7ee66fc. The command line expression used for all the environments can be found in the text file below.

Same exact code and parameters as https://gist.github.com/joschu/e42a050b1eb5cfbb1fdc667c3450467a but I ran it on the updated (v1) Mujoco environments. The new scripts are provided below. Ran on commit 987cb5d229027045fd0390533832e173237f81b6 but there shouldn't be any functional differences from the previous writeup.

Also, I (inadvertently) ran everything for 500 iterations instead of 250.

	import numpy as np, theano.tensor as TT, theano

	x = TT.scalar('x')
	y = TT.scalar('y')

	z = TT.mod(x**2, y)

	# z = x2+y2
	f = theano.function([x,y], z, allow_input_downcast=True)
	dfdx = theano.function([x,y], TT.grad(z,x),allow_input_downcast=True)