Created
May 1, 2016 04:18
-
-
Save JKCooper2/525ae31b5a97803b62e9a8b68edf74be to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Solved version of problem to determine minimum knowledge required by agent to successfully complete task | |
Observation values may be off as I threw this together pretty quick just as a proof of concept |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import logging | |
import gym | |
from Solved import SolvedAgent | |
def main(): | |
logger = logging.getLogger() | |
logger.setLevel(logging.DEBUG) | |
env = gym.make('CartPole-v0') | |
agent = SolvedAgent() | |
outdir = '/tmp/' + agent.name + '-results' | |
env.monitor.start(outdir, force=True) | |
episode_count = 200 | |
max_steps = 200 | |
reward = 0 | |
done = False | |
for i in xrange(episode_count): | |
ob = env.reset() | |
for j in xrange(max_steps): | |
action = agent.act(ob, reward, done) | |
ob, reward, done, _ = env.step(action) | |
if done: | |
break | |
env.monitor.close() | |
if __name__ == '__main__': | |
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class SolvedAgent(object): | |
def __init__(self): | |
self.name = 'solved' | |
def act(self, observation, reward, done): | |
# Guesses at observations are: | |
# observation[0] = pole speed | |
# observation[1] = pole top pos | |
# observation[2] = pole angle | |
# observation[3] = block speed | |
if (observation[2] > 0 and observation[3] > -1) or observation[3] > 1: | |
return 1 | |
return 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for the feedback. I wasn't intending this to be put on the reviewed list at it's not doing any actual learning, but it could be useful when thinking about how to determine what information is necessary to solve a problem versus what information is available.