Created
October 8, 2010 02:04
-
-
Save jccc/616249 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| blog - randomcomputation.blogspot.com | |
| (c) - jccc | |
| """ | |
| """ | |
| We provide a tutorial on register machine induction and investigat | |
| e the nature of various heuristic assumptions on learning performa | |
| nce. | |
| The machine learning problem we will deal with is simple mutiplica | |
| tion: inducing programs capable of multiplying their integer input | |
| s. Thus our training dataset, consisting of integer pairs and thei | |
| r products, will be generated in the following way: | |
| """ | |
| import random as r | |
| dataset = [(x, y, x*y) for x,y in \ | |
| [(r.randint(-10,10), r.randint(-10,10)) \ | |
| for i in range(80)]] | |
| """ | |
| Stochastic learning broadly consists of generating potential solut | |
| ion programs (hypotheses) and then checking the performance of the | |
| hypotheses over the dataset. Hypothesis programs can be based on m | |
| any different models of computation. Here we will use one of the s | |
| implest models: the register machine. The register machine takes a | |
| sequence of instructions and a finite set of registers that the in | |
| structions operate upon. Input is normally accomplished by initial | |
| izing a subset of the registers to the input values, and here outp | |
| ut is arbitrarily taken as the value of the 0th index register aft | |
| er the end of program execution. Generation of the instruction seq | |
| uences for our register machines will be accomplished by the follo | |
| wing function: | |
| """ | |
| ops = ('add', 'cpy', ' < ', 'inc') | |
| args = (0, 1, 2, 3, 4, 5) | |
| def generate(): | |
| machine = [(r.choice(ops), r.choice(args),\ | |
| r.choice(args),\ | |
| r.choice(args))\ | |
| for i in range(r.randint(5,15))] | |
| return machine | |
| """ | |
| Here we encode the instruction sequences (programs) as lists of tu | |
| ples, where each tuple is of the form (op, arg1, arg2, arg3). The | |
| effect of each op is given by the execute function below, and the | |
| arguments refer to the register indices. Here is the execute funct | |
| ion, a simple if-else switch that iterates over the program instru | |
| ctions: | |
| """ | |
| def execute(program, r): | |
| steps = 0 | |
| index = 0 | |
| while index < len(program) and steps < 100: | |
| op = program[index][0] | |
| args = (program[index][1:]) | |
| if op == 'add': | |
| r[args[0]] = r[args[1]] + r[args[2]] | |
| elif op == 'cpy': | |
| r[args[0]] = r[args[1]] | |
| elif op == ' < ': | |
| if r[args[0]] < r[args[1]] - 1: | |
| index -= 3 | |
| if index < 0: index = 0 | |
| elif op == 'inc': | |
| r[args[0]] -= 1 | |
| else: | |
| print "ERROR IN EXECUTE", op | |
| steps += 1 | |
| index += 1 | |
| """ | |
| The execute function takes a sequence of instructions and an array | |
| of registers r. It then walks down the instruction sequence carryi | |
| ng out the encoded operations on the register array. | |
| In this writing we are not concerned with purely random register m | |
| achines, but rather would like to use them to solve a problem - in | |
| teger multiplication. Thus we must determine "how good" any given | |
| program is at multiplcation. We do this by using the multiplicatio | |
| n dataset created above: for each item in the dataset we take as i | |
| nput the two numbers to be multiplied and then execute the program | |
| . We then compare the output from the program with the correct ans | |
| wer. The average error over the entire dataset is the "fitness" of | |
| each program: | |
| """ | |
| from math import fabs | |
| def fitness(hypothesis): | |
| error = 0 | |
| for item in dataset: | |
| registers = [item[0], item[1], 0, 0, 0, 0] | |
| execute(hypothesis, registers) | |
| answer = registers[0] | |
| error += fabs(answer - item[2]) | |
| return error/len(dataset) | |
| """ | |
| We now have all the components required to implement the most basi | |
| c of stochastic learning algorithms (and, in a sense, most "powerf | |
| ul"): the random search. That is, using our random generation meth | |
| od above we generate hundreds of thousands of programs and check t | |
| o see if they implement integer multiplication (or at least, can p | |
| roduce output with 0 error over our dataset). We will experiment t | |
| o determine how many programs on average are required to find a pr | |
| ogram that sucessfully multiplies its two inputs (using our instru | |
| ction set). | |
| """ | |
| def random_search(): | |
| hist = {} | |
| for i in range(100000): | |
| f = fitness(generate()) | |
| hist.setdefault(f,0) | |
| hist[f] += 1 | |
| s = hist.keys() | |
| s.sort() | |
| for key in s: | |
| print key, '*'*(hist[key]) | |
| #random_search() | |
| """ | |
| The error distribution of the top 40 or so programs out of a rando | |
| m search of 100,000 looks like the following: | |
| 17.4333333333 * | |
| 19.4666666667 * | |
| 20.0000000000 * | |
| 20.5000000000 * | |
| 20.9000000000 *** | |
| 20.9666666667 * | |
| 21.0666666667 * | |
| 21.2666666667 ********** | |
| 21.3000000000 **** | |
| 21.3333333333 ************************** | |
| 21.3666666667 * | |
| 21.4000000000 * | |
| 21.4333333333 * | |
| 21.5333333333 ** | |
| 21.6000000000 * | |
| 21.6333333333 *** | |
| 21.8000000000 *** | |
| 21.8333333333 * | |
| 21.8666666667 **** | |
| 21.9000000000 * | |
| 21.9333333333 ****** | |
| 21.9666666667 ** | |
| 22.0000000000 * | |
| 22.0333333333 *** | |
| 22.0666666667 * | |
| 22.1000000000 * | |
| 22.2333333333 * | |
| 22.2666666667 * | |
| 22.3000000000 ************* | |
| 22.3333333333 * | |
| 22.4000000000 ********* | |
| 22.4666666667 * | |
| 22.5000000000 ********************************** | |
| 22.5333333333 **** | |
| 22.5666666667 ** | |
| 22.6000000000 ********************************************* | |
| 22.6333333333 ************ | |
| 22.6666666667 *** | |
| 22.7000000000 ***** | |
| 22.7333333333 ******************************** | |
| 22.7666666667 ********************************** | |
| 22.8000000000 ******* | |
| 22.8333333333 ************ | |
| This is the bottom of the distribution (notice the exponent): | |
| 4.11782264190e+13 * | |
| 1.53872987268e+15 * | |
| 1.53872987268e+15 * | |
| 1.95155983853e+15 * | |
| 2.36438980437e+15 * | |
| 2.36438980437e+15 * | |
| 2.36438980437e+15 * | |
| 2.40892624551e+15 * | |
| 2.62709978263e+15 * | |
| 2.66462977953e+15 ** | |
| 2.92733975779e+15 * | |
| 4.81785249101e+15 * | |
| 7.59738277429e+15 * | |
| 1.50033518466e+20 * | |
| The most popular average error appears to be 25.6666666667. | |
| The distribution as a whole is quite spiky and not particularly no | |
| rmal (Gaussian) looking, with the errors ranging over 20 orders of | |
| magnitude (although I do not foreclose this may be due to some num | |
| erical weirdness in cpython). | |
| Let's look at a typical best-fitness individual from a random sear | |
| ch of 100,000 programs: | |
| """ | |
| def look_at_best(): | |
| curr_best = generate() | |
| curr_fitn = fitness(curr_best) | |
| for i in range(100000): | |
| if i%1000==0: | |
| print i, curr_fitn | |
| new = generate() | |
| fit = fitness(new) | |
| if fit < curr_fitn: | |
| curr_best = new | |
| curr_fitn = fit | |
| print curr_fitn | |
| print curr_best | |
| for item in dataset: | |
| registers = [item[0], item[1], 0, 0, 0, 0] | |
| execute(curr_best, registers) | |
| answer = registers[0] | |
| print item, answer | |
| #look_at_best() | |
| """ | |
| Output (formatted) from above function: | |
| Lowest error: 18.3 | |
| Program with lowest error: | |
| [(' < ', 3, 1, 1), | |
| (' < ', 0, 5, 4), | |
| ('add', 3, 1, 1), | |
| ('add', 1, 0, 2), | |
| ('add', 0, 3, 3), | |
| ('inc', 2, 1, 3), | |
| ('add', 2, 0, 3), | |
| (' < ', 4, 1, 3), | |
| (' < ', 1, 1, 2)] | |
| Output of this program over dataset (i.e. (x, y, x*y) output): | |
| (3, 10, 30) 40 | |
| (-2, 0, 0) -2 | |
| (9, 0, 0) 0 | |
| (-9, -1, 9) -9 | |
| (8, 8, 64) 32 | |
| (-5, -2, 10) -5 | |
| (7, 10, 70) 40 | |
| (-4, 8, -32) -4 | |
| (7, 4, 28) 16 | |
| (1, -1, -1) -4 | |
| (-3, 1, -3) -3 | |
| (-1, -9, 9) -36 | |
| (-1, -4, 4) -16 | |
| (-10, 8, -80) -10 | |
| (-6, -2, 12) -6 | |
| (9, -8, -72) -32 | |
| (9, -8, -72) -32 | |
| (3, -5, -15) -20 | |
| (-2, 3, -6) -2 | |
| (5, 10, 50) 40 | |
| (-10, -5, 50) -10 | |
| (-6, -2, 12) -6 | |
| (6, 1, 6) 4 | |
| (-9, -1, 9) -9 | |
| (2, 5, 10) 20 | |
| (5, -9, -45) -36 | |
| (8, -2, -16) -8 | |
| (-2, -1, 2) -2 | |
| (-2, -8, 16) -2 | |
| (4, 6, 24) 24 | |
| This program appears to output 4*y when x is positive, and simply | |
| output x when it is negative. To be honest, not very impressive. | |
| We have seen the efficiency of our random search algorithm on this | |
| problem: even after approximatley 100,000 fitness evaluations a mu | |
| ltiplication algorithm is not found (to be sure, one is possible g | |
| iven the instruction set: a valid program would repeatedly add one | |
| of the inputs to the output register whiledecrementing the other i | |
| nput until it equals 0). Is there a way to make this algorithm mor | |
| e efficient/powerful? | |
| To make random search more efficient we must endow it with assumpt | |
| ions. Assumptions are the essence of all learning algorithms, inde | |
| ed we might understand all stochastic learning algorithms as the c | |
| omposition (random search + constraining assumptions). The power o | |
| f a set of assumptions depends on the nature of the problem, but i | |
| t turns out that many problems that humans are interested in are a | |
| menable to a similar set of assumptions that are broadly used in g | |
| enetic algorithms / genetic programming. These assumptions can be | |
| summarized as stating that high fit individuals for a given proble | |
| m tend to be "close" to one another in the search space. | |
| The operation of this assumption is illustrated in the most simple | |
| of non-random stochastic learning algorithms: the greedy search. I | |
| n the greedy search we essentially bet that we are more likely to | |
| find fit individuals by varying current high-fitness individuals t | |
| han randomly grabbing individuals out of the cloud of all possible | |
| programs. | |
| If we are /very/ lucky the optimal programs act as attractors, and | |
| for the majority of initial randomly chosen programs there exists | |
| a chain of increasingly-fit programs that lead to the optimal solu | |
| tion. Obviously most problem-program clouds do not possess this co | |
| nvenient structure, or AI would have been here a long time ago. | |
| The following two functions implement a version of greedy search o | |
| n our multiplication problem. The first is the vary operator, by w | |
| hich we derive new programs from current programs (rather than com | |
| pletely random whole cloth as in random search). The vary operator | |
| is actually a random switch between three different methods of var | |
| ying an extant program: adding a random instruction, deleting a ra | |
| ndom instruction, or manipulating an existing instruction. | |
| The second function is the actual greedy search, which keeps track | |
| of the fittest program thus far generated and applies the variatio | |
| n operator to it until it finds a fitter program, then iterates. | |
| """ | |
| def vary(program): | |
| new = program[:] | |
| if r.random() < 0.60: | |
| if r.random() <= 0.50: | |
| new.insert(r.randint(0,len(new)), | |
| (r.choice(ops), | |
| r.choice(args), | |
| r.choice(args), | |
| r.choice(args))) | |
| else: | |
| index = r.randint(0,len(new)) | |
| new = new[:index] + new[index+1:] | |
| else: | |
| iindex = r.randint(0,len(new)-1) | |
| copy = list(new[iindex]) | |
| index = r.randint(0,3) | |
| if index == 0: | |
| copy[index] = r.choice(ops) | |
| else: | |
| copy[index] = r.choice(args) | |
| new[iindex] = tuple(copy) | |
| return new | |
| def greedy_search(): | |
| curr_prog = generate() | |
| curr_fitn = fitness(curr_prog) | |
| for i in range(50000): | |
| test_prog = vary(curr_prog) | |
| test_fitn = fitness(test_prog) | |
| if test_fitn < curr_fitn: | |
| print i, test_fitn, len(test_prog) | |
| if test_fitn <= curr_fitn: | |
| curr_prog = test_prog | |
| curr_fitn = test_fitn | |
| print curr_fitn | |
| print curr_prog | |
| for item in dataset: | |
| registers = [item[0], item[1], 0, 0, 0, 0] | |
| execute(curr_prog, registers) | |
| answer = registers[0] | |
| print item, answer | |
| greedy_search() | |
| """ | |
| Here is the summary output of a typical greedy search run. The fir | |
| st number is the amount of programs searched, the second the curre | |
| nt highest fitness (lowest error), and the third the size of the f | |
| ittest program: | |
| ... | |
| 6142 14.1625 82 | |
| 6221 14.1125 81 | |
| 6230 14.1 79 | |
| 7015 14.075 70 | |
| 7276 13.925 73 | |
| 7405 13.9 83 | |
| 7711 13.875 80 | |
| 7772 13.85 75 | |
| 7774 13.825 73 | |
| 7831 13.775 71 | |
| 7906 13.75 68 | |
| 8118 13.7 70 | |
| 8137 13.675 70 | |
| 8254 13.6625 67 | |
| 8301 13.65 72 | |
| 8342 13.6125 75 | |
| 8362 13.5875 76 | |
| 8430 13.575 79 | |
| 8520 13.5375 74 | |
| 9711 13.375 79 | |
| 10636 13.35 82 | |
| 10787 13.3 83 | |
| ... | |
| We see that after 1/10 the amount of programs searched, we are alr | |
| eady at 13.3/18.3 = 72% of the error of our fittest random search | |
| program. Thus, it appears that for this particular problem, our as | |
| sumption that "fitter programs are close to other fit programs" is | |
| valid, and using the assumption to constrain random search results | |
| in efficiency increases. | |
| """ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment