Created
April 25, 2016 22:58
-
-
Save wojzaremba/0cac4286be1b8101cc75a3edd25a7d1c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It's TRPO with neural network as value function. | |
It takes a current observation, previous observation, and previous action as the input. | |
https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7 | |
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0" |
Can you describe how you preprocessed the input? Is the state/action index fed into an embedding?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Copy: reproduced
DuplicatedInput: reproduced
Reverse: stuck at return of 1.3
RepeatCopy: plateaus at return of 13
So I'll mark the first 2 as verified. Maybe there's some randomness affecting the last two?
Also, your file
run.py
did not work for me and it looks like a dangerous script to run because it mucks around with screen, whereas I'm working in screen on my machine. Can you provide a cleaner command for running all of your scripts?