Last active
August 29, 2015 13:57
-
-
Save iandanforth/9677664 to your computer and use it in GitHub Desktop.
A detailed example of how a Goal can influence behavior in CLA
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| The answer is to allow cells that are both *predicted* and *desired* to become | |
| active. Not just put into a predictive state. | |
| Let me illustrate. | |
| Let's say you have a world with three rooms in a row with letter labels: | |
| ABC | |
| In this world you have a creature that can go Left, go Right, or Stay where it | |
| is. | |
| In this world there is also something that the creature wants at B. (Let's say | |
| cheese.) So we say that being at B is its Goal. | |
| There are only three possible ways to achieve this Goal in this world. | |
| Go Right from A to B. | |
| Go Left from C to B. | |
| Or Stay at B while at B. | |
| If we say that this creature's CLA always knows where the creature is and what | |
| it's about to do, bottom up input might look like this: | |
| In room A about to move Right | |
| AR | |
| In room B about to move Left | |
| BL | |
| In room C and about to Stay | |
| CS | |
| etc. | |
| Now lets walk through learning. | |
| Initially the creature wants the cheese (its born this way) but moves randomly | |
| around. It may go Left, Right, or Stay with equal likelihood. It doesn't yet | |
| understand how to get to the cheese. | |
| Now we know something about transitions. | |
| If I'm at A, I can try to go Left (but hit a wall) and remain at A, I can stay | |
| where I am and end up at A, or I can go Right and end up at B. Each is equally | |
| likely. | |
| -> 33% -> AS -> A | |
| [Al/AS] -> -> 33% -> AL -> A | |
| -> 33% -> AR -> B | |
| Now AR -> B is what the creature needs to learn. If its goal is B then if it's | |
| ever at A it should be *much* more likely to move Right than to move Left or | |
| Stay. | |
| We need to *reinforce* (make more likely to occur) AR. | |
| When our Goal is B we want the statistics to end up looking closer to this: | |
| -> 01% -> AS -> A | |
| [Al/AS] -> -> 01% -> AL -> A | |
| -> 98% -> AR -> B | |
| So how do we change those statistics? | |
| Let's look at what the sequence learner looks like when the creature is at A. | |
| Moving Left, Moving Right, and Staying are all possible transitions, so the | |
| sequence learner will be predicting all of them. It will *not* be predicting C | |
| (or any variant of C, like CL, CR, or CS) because that *never* follows | |
| immediately from A. | |
| Those three options will always be possible, so we always want to predict them, | |
| but when our Goal is B we almost always want AR to be the active set of cells. | |
| To do that we first need to powerfully link AR with the Goal. | |
| Every time we have the set of cells that represent AR on (this is in layer 4 and | |
| copied into 5) the next thing we'll see is B (the Goal.) | |
| At that point we strengthen the connections between the Goal and AR. | |
| After we've seen the AR -> B transition several times, we will have reinforced | |
| the Goal -> AR synapses to connected. | |
| So now the cool stuff happens. | |
| Lets say our creature is just sitting at A. That is a transition from A plus | |
| Stay, over and over. | |
| AS -> AS -> AS -> etc. | |
| Recall that at each timestep (because they are all possible) the sequence | |
| learner is still predicting AS, AR, and AL. | |
| Now we activate the Goal. | |
| The Goal is strongly connected to the representation of AR thanks to previous | |
| reinforcement. So when it turns on, you have cells that are in the predicted | |
| state thanks to the sequence learner AND cells that are being prodded (for lack | |
| of a better term) by the Goal. | |
| It is that *intersection of cells* which must then become active. AR is driven | |
| by the combination of being possible (predicted) and desired (previously | |
| reinforced). | |
| So instead of AS -> AS, layer 5 cells (which would normally just be a copy of | |
| layer 4 activation) are driven to AS -> AR! | |
| That layer 5 representation of AR can then be used to over-ride the actual | |
| behavior of the creature causing it to move Right and get the cheese. | |
| How cool is that?! | |
| The other important thing to notice is that the Goal will be connected | |
| simultaneously to all the ways to achieve the Goal in this world. | |
| It is simultaneously "prodding" AR, BS, and CL, but becaue only AR is currently | |
| in the predictive state, it causes those and only those cells to become active. | |
| (Otherwise the creature might move left at an inappropriate time). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment