Skip to content

Instantly share code, notes, and snippets.

@jccc
Created October 8, 2010 02:04
Show Gist options
  • Select an option

  • Save jccc/616249 to your computer and use it in GitHub Desktop.

Select an option

Save jccc/616249 to your computer and use it in GitHub Desktop.
"""
blog - randomcomputation.blogspot.com
(c) - jccc
"""
"""
We provide a tutorial on register machine induction and investigat
e the nature of various heuristic assumptions on learning performa
nce.
The machine learning problem we will deal with is simple mutiplica
tion: inducing programs capable of multiplying their integer input
s. Thus our training dataset, consisting of integer pairs and thei
r products, will be generated in the following way:
"""
import random as r
dataset = [(x, y, x*y) for x,y in \
[(r.randint(-10,10), r.randint(-10,10)) \
for i in range(80)]]
"""
Stochastic learning broadly consists of generating potential solut
ion programs (hypotheses) and then checking the performance of the
hypotheses over the dataset. Hypothesis programs can be based on m
any different models of computation. Here we will use one of the s
implest models: the register machine. The register machine takes a
sequence of instructions and a finite set of registers that the in
structions operate upon. Input is normally accomplished by initial
izing a subset of the registers to the input values, and here outp
ut is arbitrarily taken as the value of the 0th index register aft
er the end of program execution. Generation of the instruction seq
uences for our register machines will be accomplished by the follo
wing function:
"""
ops = ('add', 'cpy', ' < ', 'inc')
args = (0, 1, 2, 3, 4, 5)
def generate():
machine = [(r.choice(ops), r.choice(args),\
r.choice(args),\
r.choice(args))\
for i in range(r.randint(5,15))]
return machine
"""
Here we encode the instruction sequences (programs) as lists of tu
ples, where each tuple is of the form (op, arg1, arg2, arg3). The
effect of each op is given by the execute function below, and the
arguments refer to the register indices. Here is the execute funct
ion, a simple if-else switch that iterates over the program instru
ctions:
"""
def execute(program, r):
steps = 0
index = 0
while index < len(program) and steps < 100:
op = program[index][0]
args = (program[index][1:])
if op == 'add':
r[args[0]] = r[args[1]] + r[args[2]]
elif op == 'cpy':
r[args[0]] = r[args[1]]
elif op == ' < ':
if r[args[0]] < r[args[1]] - 1:
index -= 3
if index < 0: index = 0
elif op == 'inc':
r[args[0]] -= 1
else:
print "ERROR IN EXECUTE", op
steps += 1
index += 1
"""
The execute function takes a sequence of instructions and an array
of registers r. It then walks down the instruction sequence carryi
ng out the encoded operations on the register array.
In this writing we are not concerned with purely random register m
achines, but rather would like to use them to solve a problem - in
teger multiplication. Thus we must determine "how good" any given
program is at multiplcation. We do this by using the multiplicatio
n dataset created above: for each item in the dataset we take as i
nput the two numbers to be multiplied and then execute the program
. We then compare the output from the program with the correct ans
wer. The average error over the entire dataset is the "fitness" of
each program:
"""
from math import fabs
def fitness(hypothesis):
error = 0
for item in dataset:
registers = [item[0], item[1], 0, 0, 0, 0]
execute(hypothesis, registers)
answer = registers[0]
error += fabs(answer - item[2])
return error/len(dataset)
"""
We now have all the components required to implement the most basi
c of stochastic learning algorithms (and, in a sense, most "powerf
ul"): the random search. That is, using our random generation meth
od above we generate hundreds of thousands of programs and check t
o see if they implement integer multiplication (or at least, can p
roduce output with 0 error over our dataset). We will experiment t
o determine how many programs on average are required to find a pr
ogram that sucessfully multiplies its two inputs (using our instru
ction set).
"""
def random_search():
hist = {}
for i in range(100000):
f = fitness(generate())
hist.setdefault(f,0)
hist[f] += 1
s = hist.keys()
s.sort()
for key in s:
print key, '*'*(hist[key])
#random_search()
"""
The error distribution of the top 40 or so programs out of a rando
m search of 100,000 looks like the following:
17.4333333333 *
19.4666666667 *
20.0000000000 *
20.5000000000 *
20.9000000000 ***
20.9666666667 *
21.0666666667 *
21.2666666667 **********
21.3000000000 ****
21.3333333333 **************************
21.3666666667 *
21.4000000000 *
21.4333333333 *
21.5333333333 **
21.6000000000 *
21.6333333333 ***
21.8000000000 ***
21.8333333333 *
21.8666666667 ****
21.9000000000 *
21.9333333333 ******
21.9666666667 **
22.0000000000 *
22.0333333333 ***
22.0666666667 *
22.1000000000 *
22.2333333333 *
22.2666666667 *
22.3000000000 *************
22.3333333333 *
22.4000000000 *********
22.4666666667 *
22.5000000000 **********************************
22.5333333333 ****
22.5666666667 **
22.6000000000 *********************************************
22.6333333333 ************
22.6666666667 ***
22.7000000000 *****
22.7333333333 ********************************
22.7666666667 **********************************
22.8000000000 *******
22.8333333333 ************
This is the bottom of the distribution (notice the exponent):
4.11782264190e+13 *
1.53872987268e+15 *
1.53872987268e+15 *
1.95155983853e+15 *
2.36438980437e+15 *
2.36438980437e+15 *
2.36438980437e+15 *
2.40892624551e+15 *
2.62709978263e+15 *
2.66462977953e+15 **
2.92733975779e+15 *
4.81785249101e+15 *
7.59738277429e+15 *
1.50033518466e+20 *
The most popular average error appears to be 25.6666666667.
The distribution as a whole is quite spiky and not particularly no
rmal (Gaussian) looking, with the errors ranging over 20 orders of
magnitude (although I do not foreclose this may be due to some num
erical weirdness in cpython).
Let's look at a typical best-fitness individual from a random sear
ch of 100,000 programs:
"""
def look_at_best():
curr_best = generate()
curr_fitn = fitness(curr_best)
for i in range(100000):
if i%1000==0:
print i, curr_fitn
new = generate()
fit = fitness(new)
if fit < curr_fitn:
curr_best = new
curr_fitn = fit
print curr_fitn
print curr_best
for item in dataset:
registers = [item[0], item[1], 0, 0, 0, 0]
execute(curr_best, registers)
answer = registers[0]
print item, answer
#look_at_best()
"""
Output (formatted) from above function:
Lowest error: 18.3
Program with lowest error:
[(' < ', 3, 1, 1),
(' < ', 0, 5, 4),
('add', 3, 1, 1),
('add', 1, 0, 2),
('add', 0, 3, 3),
('inc', 2, 1, 3),
('add', 2, 0, 3),
(' < ', 4, 1, 3),
(' < ', 1, 1, 2)]
Output of this program over dataset (i.e. (x, y, x*y) output):
(3, 10, 30) 40
(-2, 0, 0) -2
(9, 0, 0) 0
(-9, -1, 9) -9
(8, 8, 64) 32
(-5, -2, 10) -5
(7, 10, 70) 40
(-4, 8, -32) -4
(7, 4, 28) 16
(1, -1, -1) -4
(-3, 1, -3) -3
(-1, -9, 9) -36
(-1, -4, 4) -16
(-10, 8, -80) -10
(-6, -2, 12) -6
(9, -8, -72) -32
(9, -8, -72) -32
(3, -5, -15) -20
(-2, 3, -6) -2
(5, 10, 50) 40
(-10, -5, 50) -10
(-6, -2, 12) -6
(6, 1, 6) 4
(-9, -1, 9) -9
(2, 5, 10) 20
(5, -9, -45) -36
(8, -2, -16) -8
(-2, -1, 2) -2
(-2, -8, 16) -2
(4, 6, 24) 24
This program appears to output 4*y when x is positive, and simply
output x when it is negative. To be honest, not very impressive.
We have seen the efficiency of our random search algorithm on this
problem: even after approximatley 100,000 fitness evaluations a mu
ltiplication algorithm is not found (to be sure, one is possible g
iven the instruction set: a valid program would repeatedly add one
of the inputs to the output register whiledecrementing the other i
nput until it equals 0). Is there a way to make this algorithm mor
e efficient/powerful?
To make random search more efficient we must endow it with assumpt
ions. Assumptions are the essence of all learning algorithms, inde
ed we might understand all stochastic learning algorithms as the c
omposition (random search + constraining assumptions). The power o
f a set of assumptions depends on the nature of the problem, but i
t turns out that many problems that humans are interested in are a
menable to a similar set of assumptions that are broadly used in g
enetic algorithms / genetic programming. These assumptions can be
summarized as stating that high fit individuals for a given proble
m tend to be "close" to one another in the search space.
The operation of this assumption is illustrated in the most simple
of non-random stochastic learning algorithms: the greedy search. I
n the greedy search we essentially bet that we are more likely to
find fit individuals by varying current high-fitness individuals t
han randomly grabbing individuals out of the cloud of all possible
programs.
If we are /very/ lucky the optimal programs act as attractors, and
for the majority of initial randomly chosen programs there exists
a chain of increasingly-fit programs that lead to the optimal solu
tion. Obviously most problem-program clouds do not possess this co
nvenient structure, or AI would have been here a long time ago.
The following two functions implement a version of greedy search o
n our multiplication problem. The first is the vary operator, by w
hich we derive new programs from current programs (rather than com
pletely random whole cloth as in random search). The vary operator
is actually a random switch between three different methods of var
ying an extant program: adding a random instruction, deleting a ra
ndom instruction, or manipulating an existing instruction.
The second function is the actual greedy search, which keeps track
of the fittest program thus far generated and applies the variatio
n operator to it until it finds a fitter program, then iterates.
"""
def vary(program):
new = program[:]
if r.random() < 0.60:
if r.random() <= 0.50:
new.insert(r.randint(0,len(new)),
(r.choice(ops),
r.choice(args),
r.choice(args),
r.choice(args)))
else:
index = r.randint(0,len(new))
new = new[:index] + new[index+1:]
else:
iindex = r.randint(0,len(new)-1)
copy = list(new[iindex])
index = r.randint(0,3)
if index == 0:
copy[index] = r.choice(ops)
else:
copy[index] = r.choice(args)
new[iindex] = tuple(copy)
return new
def greedy_search():
curr_prog = generate()
curr_fitn = fitness(curr_prog)
for i in range(50000):
test_prog = vary(curr_prog)
test_fitn = fitness(test_prog)
if test_fitn < curr_fitn:
print i, test_fitn, len(test_prog)
if test_fitn <= curr_fitn:
curr_prog = test_prog
curr_fitn = test_fitn
print curr_fitn
print curr_prog
for item in dataset:
registers = [item[0], item[1], 0, 0, 0, 0]
execute(curr_prog, registers)
answer = registers[0]
print item, answer
greedy_search()
"""
Here is the summary output of a typical greedy search run. The fir
st number is the amount of programs searched, the second the curre
nt highest fitness (lowest error), and the third the size of the f
ittest program:
...
6142 14.1625 82
6221 14.1125 81
6230 14.1 79
7015 14.075 70
7276 13.925 73
7405 13.9 83
7711 13.875 80
7772 13.85 75
7774 13.825 73
7831 13.775 71
7906 13.75 68
8118 13.7 70
8137 13.675 70
8254 13.6625 67
8301 13.65 72
8342 13.6125 75
8362 13.5875 76
8430 13.575 79
8520 13.5375 74
9711 13.375 79
10636 13.35 82
10787 13.3 83
...
We see that after 1/10 the amount of programs searched, we are alr
eady at 13.3/18.3 = 72% of the error of our fittest random search
program. Thus, it appears that for this particular problem, our as
sumption that "fitter programs are close to other fit programs" is
valid, and using the assumption to constrain random search results
in efficiency increases.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment