Skip to content

Instantly share code, notes, and snippets.

@iandanforth
Last active August 29, 2015 13:57
Show Gist options
  • Select an option

  • Save iandanforth/9677664 to your computer and use it in GitHub Desktop.

Select an option

Save iandanforth/9677664 to your computer and use it in GitHub Desktop.
A detailed example of how a Goal can influence behavior in CLA
The answer is to allow cells that are both *predicted* and *desired* to become
active. Not just put into a predictive state.
Let me illustrate.
Let's say you have a world with three rooms in a row with letter labels:
ABC
In this world you have a creature that can go Left, go Right, or Stay where it
is.
In this world there is also something that the creature wants at B. (Let's say
cheese.) So we say that being at B is its Goal.
There are only three possible ways to achieve this Goal in this world.
Go Right from A to B.
Go Left from C to B.
Or Stay at B while at B.
If we say that this creature's CLA always knows where the creature is and what
it's about to do, bottom up input might look like this:
In room A about to move Right
AR
In room B about to move Left
BL
In room C and about to Stay
CS
etc.
Now lets walk through learning.
Initially the creature wants the cheese (its born this way) but moves randomly
around. It may go Left, Right, or Stay with equal likelihood. It doesn't yet
understand how to get to the cheese.
Now we know something about transitions.
If I'm at A, I can try to go Left (but hit a wall) and remain at A, I can stay
where I am and end up at A, or I can go Right and end up at B. Each is equally
likely.
-> 33% -> AS -> A
[Al/AS] -> -> 33% -> AL -> A
-> 33% -> AR -> B
Now AR -> B is what the creature needs to learn. If its goal is B then if it's
ever at A it should be *much* more likely to move Right than to move Left or
Stay.
We need to *reinforce* (make more likely to occur) AR.
When our Goal is B we want the statistics to end up looking closer to this:
-> 01% -> AS -> A
[Al/AS] -> -> 01% -> AL -> A
-> 98% -> AR -> B
So how do we change those statistics?
Let's look at what the sequence learner looks like when the creature is at A.
Moving Left, Moving Right, and Staying are all possible transitions, so the
sequence learner will be predicting all of them. It will *not* be predicting C
(or any variant of C, like CL, CR, or CS) because that *never* follows
immediately from A.
Those three options will always be possible, so we always want to predict them,
but when our Goal is B we almost always want AR to be the active set of cells.
To do that we first need to powerfully link AR with the Goal.
Every time we have the set of cells that represent AR on (this is in layer 4 and
copied into 5) the next thing we'll see is B (the Goal.)
At that point we strengthen the connections between the Goal and AR.
After we've seen the AR -> B transition several times, we will have reinforced
the Goal -> AR synapses to connected.
So now the cool stuff happens.
Lets say our creature is just sitting at A. That is a transition from A plus
Stay, over and over.
AS -> AS -> AS -> etc.
Recall that at each timestep (because they are all possible) the sequence
learner is still predicting AS, AR, and AL.
Now we activate the Goal.
The Goal is strongly connected to the representation of AR thanks to previous
reinforcement. So when it turns on, you have cells that are in the predicted
state thanks to the sequence learner AND cells that are being prodded (for lack
of a better term) by the Goal.
It is that *intersection of cells* which must then become active. AR is driven
by the combination of being possible (predicted) and desired (previously
reinforced).
So instead of AS -> AS, layer 5 cells (which would normally just be a copy of
layer 4 activation) are driven to AS -> AR!
That layer 5 representation of AR can then be used to over-ride the actual
behavior of the creature causing it to move Right and get the cheese.
How cool is that?!
The other important thing to notice is that the Goal will be connected
simultaneously to all the ways to achieve the Goal in this world.
It is simultaneously "prodding" AR, BS, and CL, but becaue only AR is currently
in the predictive state, it causes those and only those cells to become active.
(Otherwise the creature might move left at an inappropriate time).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment