Last active
March 7, 2019 14:59
-
-
Save lmassaron/8771cee2261701d5093a6f0a8a4fd7bf to your computer and use it in GitHub Desktop.
Attention mechanism
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| https://machinelearningmastery.com/attention-long-short-term-memory-recurrent-neural-networks/ | |
| https://stackoverflow.com/questions/42918446/how-to-add-an-attention-mechanism-in-keras | |
| # --- Attention is all you need --- # | |
| _,_,units = layer.shape.as_list() | |
| attention = Dense(1, activation='tanh')(layer) | |
| attention = Flatten()(attention) | |
| attention = Activation('softmax')(attention) | |
| attention = RepeatVector(units)(attention) | |
| attention = Permute([2, 1])(attention) | |
| sent_representation = multiply([layer, attention]) | |
| sent_representation = Lambda(lambda x: K.sum(x, axis=-2), | |
| output_shape=(units,))(sent_representation) | |
| # ---------------------------------- # | |
| Attention mechanisms have seen wide adoption in neural NLP models. | |
| We consider exemplar NLP tasks for which attention mechanisms are commonly used: classification, | |
| natural language inference (NLI), and question answering. | |
| natural language inference (NLI) | |
| human-written English sentence pairs manually | |
| labeled for balanced classification with the labels | |
| neutral, contradiction, and entailment | |
| Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly | |
| learning to align and translate. arXiv preprint | |
| arXiv:1409.0473 | |
| Attention mechanisms in neural networks serve to orient perception as well as memory access (you might even say perception is just a very short-term subset of all memory). Attention filters the perceptions that can be stored in memory, and filters them again on a second pass when they are to be retrieved from memory. Attention can be aimed at the present and the past. | |
| Attention matters because it has been shown to produce state-of-the-art results in machine translation and other natural language processing tasks, when combined with neural word embeddings, and is one component of breakthrough algorithms such as Transformer and BERT, which is setting new records in accuracy in NLP. So attention is part of our best effort to date to create real natural-language understanding in machines. If that succeeds, it will have an enormous impact on society and almost every form of business. | |
| One of the natural language processing problems that researchers have struggled with is how to link pronouns to antecedents. | |
| Also, there are many shades of meaning for a given word that only emerge due to its situation in a passage and its inter-relations with other words (but which words?). | |
| algorithms can allocate attention, and they can learn how to do so, by adjusting the weights they assign to various inputs. | |
| a recurrent neural network like an LSTM is often used, since it takes account of information in the present time step as well as the context of past time steps. Below is one way to think about how a recurrent network operates: at each time step, it combines input from the present moment, as well as input from the memory layer, to make a decision about the data. | |
| RNNs cram everything they know about a sequence of data elements into the final hidden state of the network. An attention mechanism takes into account the input from several time steps, say, to make one prediction. It distributes attention over the hidden states of several time steps. | |
| In autumn 2017, Google separated the attention mechanism from recurrent networks and showed that it could outperform RNNs alone, with an architecture called Transformer. | |
| Up until quite lately, most CNN models worked directly on entire images or video frames, with equal priority given to all image pixels at the earliest stage of processing. The primate visual system works differently. Rather than processing all input in parallel, visual attention shifts strategically among locations and objects, centering processing resources and representational coordinates on a series of regions in turn. | |
| One such network was able to use this selective attentional mechanism to ignore irrelevant objects in a scene, allowing it to perform well in challenging object classification tasks in the presence of clutter. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment