Created
April 25, 2016 22:58
-
-
Save wojzaremba/0cac4286be1b8101cc75a3edd25a7d1c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It's TRPO with neural network as value function. | |
It takes a current observation, previous observation, and previous action as the input. | |
https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7 | |
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Can you describe how you preprocessed the input? Is the state/action index fed into an embedding?