Skip to content

Instantly share code, notes, and snippets.

@HenryJia
Last active July 15, 2022 17:34
Show Gist options
  • Save HenryJia/23db12d61546054aa43f8dc587d9dc2c to your computer and use it in GitHub Desktop.
Save HenryJia/23db12d61546054aa43f8dc587d9dc2c to your computer and use it in GitHub Desktop.
Solving OpenAI's Cartpole with a very simple PID controller in 35 lines
import numpy as np
import gym
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
env = gym.make('CartPole-v1')
desired_state = np.array([0, 0, 0, 0])
desired_mask = np.array([0, 0, 1, 0])
P, I, D = 0.1, 0.01, 0.5
for i_episode in range(20):
state = env.reset()
integral = 0
derivative = 0
prev_error = 0
for t in range(500):
env.render()
error = state - desired_state
integral += error
derivative = error - prev_error
prev_error = error
pid = np.dot(P * error + I * integral + D * derivative, desired_mask)
action = sigmoid(pid)
action = np.round(action).astype(np.int32)
state, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.close()
@pkrobinette
Copy link

Hi! I am trying to figure out why the Pole Angle error is the only one you care about (desired_mask = np.array([0,0,1,0]). Was this trial and error or is there a specific reason why that error value is more important? Thanks!

@pawelzwronek
Copy link

desired_mask = np.array([1, 1, 1, 1])

P, I, D = [1/150, 1/950, 0.1, 0.01], [0.0005, 0.001, 0.01, 0.0001], [0.2, 0.0001, 0.5, 0.005]

with this change you can see how every state's component is taken into calculations
also try this following state = env.reset()

env.state[0] *= 30

this snippet disable fps limit

import pyglet
pyglet.options['vsync'] = False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment