iandanforth · August 29, 2015 13:57
diff --git a/GoalDirectedBehaviorInCLA b/GoalDirectedBehaviorInCLA
 The answer is to allow cells that are both *predicted* and *desired* to become
 active. Not just put into a predictive state.

 Let me illustrate.

 Let's say you have a world with three rooms in a row with letter labels:

 ABC

 In this world you have a creature that can go Left, go Right, or Stay where it
 is. 

 In this world there is also something that the creature wants at B. (Let's say
 cheese.) So we say that being at B is its Goal.

 There are only three possible ways to achieve this Goal in this world.

 Go Right from A to B.
 Go Left from C to B.
 Or Stay at B while at B.

 If we say that this creature's CLA always knows where the creature is and what
 it's about to do, bottom up input might look like this:

 In room A about to move Right

 AR

 In room B about to move Left

 BL

 In room C and about to Stay

 CS

 etc.

 Now lets walk through learning.

 Initially the creature wants the cheese (its born this way) but moves randomly
 around. It may go Left, Right, or Stay with equal likelihood. It doesn't yet
 understand how to get to the cheese.

 Now we know something about transitions.

 If I'm at A, I can try to go Left (but hit a wall) and remain at A, I can stay
 where I am and end up at A, or I can go Right and end up at B. Each is equally
 likely.

             ->   33% ->  AS -> A      
 [Al/AS]   -> ->   33% ->  AL -> A      
             ->   33% ->  AR -> B 

 Now AR -> B is what the creature needs to learn. If its goal is B then if it's
 ever at A it should be *much* more likely to move Right than to move Left or
 Stay.

 We need to *reinforce* (make more likely to occur) AR.

 When our Goal is B we want the statistics to end up looking closer to this:

             ->   01% ->  AS -> A      
 [Al/AS]   -> ->   01% ->  AL -> A      
             ->   98% ->  AR -> B 

 So how do we change those statistics?

 Let's look at what the sequence learner looks like when the creature is at A.

 Moving Left, Moving Right, and Staying are all possible transitions, so the
 sequence learner will be predicting all of them. It will *not* be predicting C
 (or any variant of C, like CL, CR, or CS) because that *never* follows
 immediately from A. 

 Those three options will always be possible, so we always want to predict them,
 but when our Goal is B we almost always want AR to be the active set of cells.

 To do that we first need to powerfully link AR with the Goal. 

 Every time we have the set of cells that represent AR on (this is in layer 4 and
 copied into 5) the next thing we'll see is B (the Goal.)

 At that point we strengthen the connections between the Goal and AR.

 After we've seen the AR -> B transition several times, we will have reinforced
 the Goal -> AR synapses to connected.

 So now the cool stuff happens. 

 Lets say our creature is just sitting at A. That is a transition from A plus
 Stay, over and over.

 AS -> AS -> AS -> etc.

 Recall that at each timestep (because they are all possible) the sequence
 learner is still predicting AS, AR, and AL. 

 Now we activate the Goal. 

 The Goal is strongly connected to the representation of AR thanks to previous
 reinforcement. So when it turns on, you have cells that are in the predicted
 state thanks to the sequence learner AND cells that are being prodded (for lack
 of a better term) by the Goal. 

 It is that *intersection of cells* which must then become active. AR is driven
 by the combination of being possible (predicted) and desired (previously
 reinforced).

 So instead of AS -> AS, layer 5 cells (which would normally just be a copy of
 layer 4 activation) are driven to AS -> AR!

 That layer 5 representation of AR can then be used to over-ride the actual
 behavior of the creature causing it to move Right and get the cheese.

 How cool is that?!

 The other important thing to notice is that the Goal will be connected
 simultaneously to all the ways to achieve the Goal in this world.

 It is simultaneously "prodding" AR, BS, and CL, but becaue only AR is currently
 in the predictive state, it causes those and only those cells to become active.
 (Otherwise the creature might move left at an inappropriate time).
	The answer is to allow cells that are both predicted and desired to become
	active. Not just put into a predictive state.

	Let me illustrate.

	Let's say you have a world with three rooms in a row with letter labels:

	ABC

	In this world you have a creature that can go Left, go Right, or Stay where it
	is.

	In this world there is also something that the creature wants at B. (Let's say
	cheese.) So we say that being at B is its Goal.

	There are only three possible ways to achieve this Goal in this world.

	Go Right from A to B.
	Go Left from C to B.
	Or Stay at B while at B.

	If we say that this creature's CLA always knows where the creature is and what
	it's about to do, bottom up input might look like this:

	In room A about to move Right

	AR

	In room B about to move Left

	BL

	In room C and about to Stay

	CS

	etc.

	Now lets walk through learning.

	Initially the creature wants the cheese (its born this way) but moves randomly
	around. It may go Left, Right, or Stay with equal likelihood. It doesn't yet
	understand how to get to the cheese.

	Now we know something about transitions.

	If I'm at A, I can try to go Left (but hit a wall) and remain at A, I can stay
	where I am and end up at A, or I can go Right and end up at B. Each is equally
	likely.

	-> 33% -> AS -> A
	[Al/AS] -> -> 33% -> AL -> A
	-> 33% -> AR -> B

	Now AR -> B is what the creature needs to learn. If its goal is B then if it's
	ever at A it should be much more likely to move Right than to move Left or
	Stay.

	We need to reinforce (make more likely to occur) AR.

	When our Goal is B we want the statistics to end up looking closer to this:

	-> 01% -> AS -> A
	[Al/AS] -> -> 01% -> AL -> A
	-> 98% -> AR -> B

	So how do we change those statistics?

	Let's look at what the sequence learner looks like when the creature is at A.

	Moving Left, Moving Right, and Staying are all possible transitions, so the
	sequence learner will be predicting all of them. It will not be predicting C
	(or any variant of C, like CL, CR, or CS) because that never follows
	immediately from A.

	Those three options will always be possible, so we always want to predict them,
	but when our Goal is B we almost always want AR to be the active set of cells.

	To do that we first need to powerfully link AR with the Goal.

	Every time we have the set of cells that represent AR on (this is in layer 4 and
	copied into 5) the next thing we'll see is B (the Goal.)

	At that point we strengthen the connections between the Goal and AR.

	After we've seen the AR -> B transition several times, we will have reinforced
	the Goal -> AR synapses to connected.

	So now the cool stuff happens.

	Lets say our creature is just sitting at A. That is a transition from A plus
	Stay, over and over.

	AS -> AS -> AS -> etc.

	Recall that at each timestep (because they are all possible) the sequence
	learner is still predicting AS, AR, and AL.

	Now we activate the Goal.

	The Goal is strongly connected to the representation of AR thanks to previous
	reinforcement. So when it turns on, you have cells that are in the predicted
	state thanks to the sequence learner AND cells that are being prodded (for lack
	of a better term) by the Goal.

	It is that intersection of cells which must then become active. AR is driven
	by the combination of being possible (predicted) and desired (previously
	reinforced).

	So instead of AS -> AS, layer 5 cells (which would normally just be a copy of
	layer 4 activation) are driven to AS -> AR!

	That layer 5 representation of AR can then be used to over-ride the actual
	behavior of the creature causing it to move Right and get the cheese.

	How cool is that?!

	The other important thing to notice is that the Goal will be connected
	simultaneously to all the ways to achieve the Goal in this world.

	It is simultaneously "prodding" AR, BS, and CL, but becaue only AR is currently
	in the predictive state, it causes those and only those cells to become active.
	(Otherwise the creature might move left at an inappropriate time).
No results found