Skip to content

Instantly share code, notes, and snippets.

View wojzaremba's full-sized avatar

Wojciech Zaremba wojzaremba

View GitHub Profile
Code is located at : https://github.com/wojzaremba/trpo . This solution is based on commit : 5d86623abeb5759de155f495789bbb4afd74aae5
It takes < 1 min. on CPU to get this results.
Just run:
python main.py
Run https://github.com/wojzaremba/trpo/blob/master/main_duplicated.py
from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95
Run https://github.com/wojzaremba/trpo/blob/master/main_copy.py
from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95
git : https://github.com/wojzaremba/trpo commit : 6bb9fe32d5bb3413cd76e60518d49c58b2716ad1
is an implementation of TRPO with poor man memory. It concatenates last observation and state to the current observation.
It allows to solve tasks that require very short memory (e.g. reverse). Execute this script on 3 tasks Copy-v0,
DuplicatedInput-v0 and Reverse-v0) by calling:
python run.py
It starts 3 screen instances. Training takes ~1 min.
It's TRPO with neural network as value function.
It takes a current observation, previous observation, and previous action as the input.
https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"
This repo implements recurrent neural network that optimizes TRPO loss function. Moreover, we use
a neural network as value function.
https://github.com/wojzaremba/trpo_rnn , commit_id da6fb44bd2980cd26dd057aff01f55a533a742fa
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0",
"ReversedAddition-v0", "ReversedAddition3-v0"