Kory korymath

basic_rl (v.0.0.3)

A basic_rl.py provides a simple implementation of SARSA/Q-learning algorithms (specified by -a flag) with epsilon-greedy/softmax policies (specified by -p flag). You can also select the environment other than Roulette-v0 using -e flag. It also generates a graphical summary of your simulation.

Type the following command in your console to run the simulation using the default setting.

chmod +x basic_rl.py
./basic_rl.py

do androids dream of cooking?

The following recipes are sampled from a trained neural net. You can find the repo to train your own neural net here: https://github.com/karpathy/char-rnn Thanks to Andrej Karpathy for the great code! It's really easy to setup.

The recipes I used for training the char-rnn are from a recipe collection called ffts.com And here is the actual zipped data (uncompressed ~35 MB) I used for training. The ZIP is also archived @ archive.org in case the original links becomes invalid in the future.

	# Safari Reader Night Theme
	# by @LogicaEns

	# background = 39 40 34 (#272822)
	defaults write TeXShop background_R 0.05
	defaults write TeXShop background_G 0.06
	defaults write TeXShop background_B 0.03

	# commands = 102 217 239 (#66D9EF)
	defaults write TeXShop commandred 0.3

	import functools

	import tensorflow as tf


	class share_variables(object):

	def __init__(self, callable_):
	self._callable = callable_
	self._wrappers = {}

	from __future__ import absolute_import
	from __future__ import division
	from __future__ import print_function

	import numpy as np
	import tensorflow as tf


	class GRU(tf.contrib.rnn.RNNCell):

	"""adapted from https://github.com/OlavHN/bnlstm to store separate population statistics per state"""
	import tensorflow as tf, numpy as np
	RNNCell = tf.nn.rnn_cell.RNNCell

	class BNLSTMCell(RNNCell):
	'''Batch normalized LSTM as described in arxiv.org/abs/1603.09025'''
	def __init__(self, num_units, is_training_tensor, max_bn_steps, initial_scale=0.1, activation=tf.tanh, decay=0.95):
	"""
	* max bn steps is the maximum number of steps for which to store separate population stats
	"""

	# From "A simple unix/linux daemon in Python" by Sander Marechal
	# See http://stackoverflow.com/a/473702/1422096 and http://web.archive.org/web/20131017130434/http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
	#
	# Modified to add quit() that allows to run some code before closing the daemon
	# See http://stackoverflow.com/a/40423758/1422096
	#
	# Modified for Python 3 (see also: http://web.archive.org/web/20131017130434/http://www.jejik.com/files/examples/daemon3x.py)
	#
	# Joseph Ernest, 20200507_1220

	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward

	# Example for my blog post at:
	# https://danijar.com/introduction-to-recurrent-networks-in-tensorflow/
	import functools
	import sets
	import tensorflow as tf


	def lazy_property(function):
	attribute = '_' + function.__name__

	from math import sqrt

	def put_kernels_on_grid (kernel, pad = 1):

	'''Visualize conv. filters as an image (mostly for the 1st layer).
	Arranges filters into a grid, with some paddings between adjacent filters.

	Args:
	kernel: tensor of shape [Y, X, NumChannels, NumKernels]
	pad: number of black pixels around each filter (between them)