matthew mcdonald mafm

Cross Entropy Method

How do we solve for the policy optimization problem which is to maximize the total reward given some parametrized policy?

Discounted future reward

To begin with, for an episode the total reward is the sum of all the rewards. If our environment is stochastic, we can never be sure if we will get the same rewards the next time we perform the same actions. Thus the more we go into the future the more the total future reward may diverge. So for that reason it is common to use the discounted future reward where the parameter discount is called the discount factor and is between 0 and 1.

A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward. In other words we want to maximize the expected reward per episode.

Simpler, Easier!

Lennart Augustsson, Oct 25, 2007

In a recent paper, Simply Easy! (An Implementation of a Dependently Typed Lambda Calculus), the authors argue that type checking a dependently typed language is easy. I agree whole-heartedly, it doesn't have to be difficult at all. But I don't think the paper presents the easiest way to do it. So here is my take on how to write a simple dependent type checker. (There's nothing new here, and the authors of the paper are undoubtedly familiar with all of it.)

First, the untyped lambda calculus.

I'll start by implementing the untyped lambda calculus. It's a very simple language with just three constructs: variables, applications, and lambda expressions, i.e.,

	// This is a quick demonstration of "function inheritance" as described in
	// this paper from Daniel Brown and William Cook.
	// http://www.cs.utexas.edu/users/wcook/Drafts/2009/sblp09-memo-mixins.pdf
	// Expressed in TypeScript (and without the monads).

	// Syntax note: When you write function types in TypeScript, you need to name
	// each parameter. But the names don't actually matter, so I just use _. You
	// can read `(_:A) => B` as `a -> b` in ML or Haskell syntax.

	// In Brown and Cook's Haskell, `type Gen a = a -> a` is a "generator." The

	type term =
	\| Lam of (term -> term)
	\| Pi of term * (term -> term)
	\| Appl of term * term
	\| Ann of term * term
	\| FreeVar of int
	\| Star
	\| Box

	let unfurl lvl f = f (FreeVar lvl)

	--Roughly based on https://github.com/Gabriel439/Haskell-Morte-Library/blob/master/src/Morte/Core.hs by Gabriel Gonzalez et al.

	data Expr = Star \| Box \| Var Int \| Lam Int Expr Expr \| Pi Int Expr Expr \| App Expr Expr deriving (Show, Eq)

	subst v e (Var v') \| v == v' = e
	subst v e (Lam v' ta b ) \| v == v' = Lam v' (subst v e ta) b
	subst v e (Lam v' ta b ) = Lam v' (subst v e ta) (subst v e b )
	subst v e (Pi v' ta tb) \| v == v' = Pi v' (subst v e ta) tb
	subst v e (Pi v' ta tb) = Pi v' (subst v e ta) (subst v e tb)
	subst v e (App f a ) = App (subst v e f ) (subst v e a )

	import static java.lang.System.*;

	import java.util.function.BiFunction;
	import java.util.function.Function;

	// Implementation of a pseudo-GADT in Java, translating the examples from
	// http://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf
	// The technique presented below is, in fact, just an encoding of a normal Algebraic Data Type
	// using a variation of the visitor pattern + the application of the Yoneda lemma to make it
	// isomorphic to the targeted 'GADT'.

	Require Import List.
	Require Import FunctionalExtensionality.
	Require Import ZArith.
	Require Import Zcompare.

	(* c.f. https://twitter.com/Hillelogram/status/987432184217731073 *)

	Set Implicit Arguments.

	Lemma map_combine : forall A B C (f : A -> B) (g : A -> C) xs,

	#!/usr/local/Gambit-C/bin/gsi

	; Copyright (C) 2004 by Marc Feeley, All Rights Reserved.

	; This is the "90 minute Scheme to C compiler" presented at the
	; Montreal Scheme/Lisp User Group on October 20, 2004.

	; Usage with Gambit-C 4.0:
	;
	; % ./90-min-scc.scm test.scm

	(*
	* An OCaml implementation of final tagless, inspired from this article by Oleksandr Manzyuk:
	* https://oleksandrmanzyuk.wordpress.com/2014/06/18/from-object-algebras-to-finally-tagless-interpreters-2/
	*)

	module FinalTagless = struct
	type eval = { eval : int }
	type view = { view : string }

	module type ExpT = sig

	#!/usr/bin/env python2
	#-- coding: utf-8 --

	# NOTE FOR WINDOWS USERS:
	# You can download a "exefied" version of this game at:
	# http://hi-im.laria.me/progs/tetris_py_exefied.zip
	# If a DLL is missing or something like this, write an E-Mail ([email protected])
	# or leave a comment on this gist.

	# Very simple tetris implementation