dfd

Cross Entropy Method

How do we solve for the policy optimization problem which is to maximize the total reward given some parametrized policy?

Discounted future reward

To begin with, for an episode the total reward is the sum of all the rewards. If our environment is stochastic, we can never be sure if we will get the same rewards the next time we perform the same actions. Thus the more we go into the future the more the total future reward may diverge. So for that reason it is common to use the discounted future reward where the parameter discount is called the discount factor and is between 0 and 1.

A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward. In other words we want to maximize the expected reward per episode.

Motivation

tl;dr I want to use Rust to program robots. Help me find the best core libraries to build on.

Robotic systems require high performance and reliability, but also have enormous complexity in terms of algorithms employed, number of subsystems, embedded hardware control, and other metrics. Development is mostly split between C++ for performance and safety critical components, and MatLab or Python for quick research or task iteration.

These links no longer work. Springer have pulled the free plug.

Graduate texts in mathematics

duplicates = multiple editions

A Classical Introduction to Modern Number Theory, Kenneth Ireland Michael Rosen

##VGG16 model for Keras

This is the Keras model of the 16-layer network used by the VGG team in the ILSVRC-2014 competition.

It has been obtained by directly converting the Caffe model provived by the authors.

Details about the network architecture can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

Git Advanced Resources

Interactive Rebase

Enter interactive rebase using git rebase -i
Some rebase options include:
- squash: combine commits
- edit: split commits (using git reset HEAD^)
- reword: rename commit
- pick: run commits in order

	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward

	data {
	int N;
	int M;
	real<lower=0> Y[N];
	}

	parameters {
	real<lower=0> mu;
	real<lower=0> phi;
	real<lower=1, upper=2> theta;

	#!/usr/bin/env python
	# -- coding: utf-8 --

	# Python 3 and compatibility with Python 2
	from __future__ import unicode_literals, print_function

	import os
	import sys
	import re
	import logging