Code used to obtain these results can be found at the url
https://github.com/joschu/modular_rl, commit 50cdfdf375e69d86e3db6eb2ad0218ea6aebf371.
The command line expression used for all the environments can be found in the text file below.
Note that the same exact parameters and policies were used for all tasks, except for timesteps_per_batch
, which was varied based on the difficulty of the task.
The important parameters are:
gamma=0.995
: discountlam=0.97
: see GAE paper for explanationagent=TrpoAgent
: name of the class, which specifies policy and value function architecture. In this case, we used two hidden layers of size 64, with tanh activationscg_damping
: multiple of the identity added for conjugate gradient