Skip to content

Instantly share code, notes, and snippets.

View denisyarats's full-sized avatar

Denis Yarats denisyarats

View GitHub Profile
#!/usr/local/bin/python
"""
Q-learning - off policy TD(0) learning.
Q(S, A) <- Q(S, A) + alpha * ((R + gamma * max(Q(S', A'))) - Q(S, A))
A ~ e-greedy from pi(A|S)
"""
import argparse
import numpy as np
@denisyarats
denisyarats / gist:4981579b42c8a08e49206347f7d28c6c
Created January 17, 2017 07:16
./sarsa.py --max_episodes 10000 --alpha 0.3 --gamma 0.9 --eps 0.2 --eps_schedule 200 --goal 25 --env copy --upload
#!/usr/local/bin/python
"""
SARSA - on policy TD(0) learning.
Q(S, A) <- Q(S, A) + alpha * ((R + gamma * Q(S', A')) - Q(S, A))
A, A' ~ e-greedy from pi(A|S)
"""
import argparse
import numpy as np
@denisyarats
denisyarats / on_policy_mc.py
Created January 15, 2017 05:32
on policy mc
#!/usr/local/bin/python
import argparse
import numpy as np
from collections import defaultdict
import gym
from gym import wrappers
import pdb
EXP_NAME_PREFIX = 'exp/on_policy_mc'