Wojciech Zaremba wojzaremba

wojzaremba / vanilla_trpo

Last active April 23, 2016 22:54

	Code is located at : https://github.com/wojzaremba/trpo . This solution is based on commit : 5d86623abeb5759de155f495789bbb4afd74aae5

	It takes < 1 min. on CPU to get this results.

	Just run:
	python main.py

wojzaremba / gist:637876a7c3e59e2cd22aebe35e17a835

Created April 24, 2016 01:03

	Run https://github.com/wojzaremba/trpo/blob/master/main_duplicated.py

	from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95

wojzaremba / gist:0edf6c8a9c292b32ba196c90f91126d7

Created April 24, 2016 01:08

	Run https://github.com/wojzaremba/trpo/blob/master/main_copy.py

	from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95

wojzaremba / gist:b474cfa2fbc93ddf3f8d0ea2ff76f362

Created April 24, 2016 17:48

	git : https://github.com/wojzaremba/trpo commit : 6bb9fe32d5bb3413cd76e60518d49c58b2716ad1
	is an implementation of TRPO with poor man memory. It concatenates last observation and state to the current observation.

	It allows to solve tasks that require very short memory (e.g. reverse). Execute this script on 3 tasks Copy-v0,
	DuplicatedInput-v0 and Reverse-v0) by calling:

	python run.py

	It starts 3 screen instances. Training takes ~1 min.

wojzaremba / TRPO with prev

Created April 24, 2016 23:12

Init

wojzaremba / gist:0cac4286be1b8101cc75a3edd25a7d1c

Created April 25, 2016 22:58

	It's TRPO with neural network as value function.

	It takes a current observation, previous observation, and previous action as the input.

	https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

	Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"

wojzaremba / gist:3f901fcd50d7aa7a38b81aa16cbd899c

Created April 27, 2016 02:34

	This repo implements recurrent neural network that optimizes TRPO loss function. Moreover, we use
	a neural network as value function.

	https://github.com/wojzaremba/trpo_rnn , commit_id da6fb44bd2980cd26dd057aff01f55a533a742fa

	Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0",
	"ReversedAddition-v0", "ReversedAddition3-v0"