- The paper explains how to apply dropout to LSTMs and how it could reduce overfitting in tasks like language modelling, speech recognition, image caption generation and machine translation.
- Link to the paper
- Regularisation method that drops out (or temporarily removes) units in a neural network. the network, along with all its incoming and outgoing connections
- Conventional dropout does not work well with RNNs as the recurrence amplifies the noise and hurts learning.
- The paper proposes to apply dropout to only the non-recurrent connections.
- The dropout operator would corrupt information carried by some units (and not all) forcing them to perform intermediate computations more robustly.
- The information is corrupted L+1 times where L is the number of layers and is independent of timestamps traversed by the information.
- In the context of language modelling, image caption generation, speech recognition and machine translation, dropout enables training larger networks and reduces the testing error in terms of perplexity and frame accuracy.