This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Code is located at : https://github.com/wojzaremba/trpo . This solution is based on commit : 5d86623abeb5759de155f495789bbb4afd74aae5 | |
It takes < 1 min. on CPU to get this results. | |
Just run: | |
python main.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Run https://github.com/wojzaremba/trpo/blob/master/main_duplicated.py | |
from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Run https://github.com/wojzaremba/trpo/blob/master/main_copy.py | |
from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
git : https://github.com/wojzaremba/trpo commit : 6bb9fe32d5bb3413cd76e60518d49c58b2716ad1 | |
is an implementation of TRPO with poor man memory. It concatenates last observation and state to the current observation. | |
It allows to solve tasks that require very short memory (e.g. reverse). Execute this script on 3 tasks Copy-v0, | |
DuplicatedInput-v0 and Reverse-v0) by calling: | |
python run.py | |
It starts 3 screen instances. Training takes ~1 min. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Init |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It's TRPO with neural network as value function. | |
It takes a current observation, previous observation, and previous action as the input. | |
https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7 | |
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This repo implements recurrent neural network that optimizes TRPO loss function. Moreover, we use | |
a neural network as value function. | |
https://github.com/wojzaremba/trpo_rnn , commit_id da6fb44bd2980cd26dd057aff01f55a533a742fa | |
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", | |
"ReversedAddition-v0", "ReversedAddition3-v0" |