wojzaremba · April 25, 2016 22:58 · joschu · Apr 27, 2016 · rbrigden · Sep 5, 2017
diff --git a/gistfile1.txt b/gistfile1.txt
 It's TRPO with neural network as value function.

 It takes a current observation, previous observation, and previous action as the input.

 https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

 Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"
	It's TRPO with neural network as value function.

	It takes a current observation, previous observation, and previous action as the input.

	https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

	Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"