What is Transformers in NLP?

Concepts

Attention

Encoder-Decoder Attention: Attention between the input sequence and the output sequence.
Self attention in the input sequence: Attends to all the words in the input sequence.
Self attention in the output sequence: One thing we should be wary of here is that the scope of self attention is limited to the words that occur before a given word. This prevents any information leaks during the training of the model. This is done by masking the words that occur after it for each step. So for step 1, only the first word of the output sequence is NOT masked, for step 2, the first two words are NOT masked and so on.

Videos

https://www.youtube.com/watch?v=rURRYI66E54
- Introduction, language models in general (include RNN, LSTM)
https://www.youtube.com/watch?v=TQQlZhbC5ps
- Transformer architecture

Papers

Attention is All You Need

Transformer was proposed in the paper Attention is All You Need, 2017.

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.![]

http://nlp.seas.harvard.edu/2018/04/03/attention.html

IndoNLU

https://arxiv.org/abs/2009.05387, 2020

Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources. In response, we introduce the first-ever vast resource for the training, evaluating, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification to pair-sentences sequence labeling with different levels of complexity. The datasets for the tasks lie in different domains and styles to ensure task diversity. We also provide a set of Indonesian pre-trained models (IndoBERT) trained from a large and clean Indonesian dataset Indo4B collected from publicly available sources such as social media texts, blogs, news, and websites. We release baseline models for all twelve tasks, as well as the framework for benchmark evaluation, and thus it enables everyone to benchmark their system performances.

Tutorial: https://indobenchmark.github.io/tutorials/pytorch/deep%20learning/nlp/2020/10/18/basic-pytorch-en.html

ariefrahmansyah/transformers-nlp.md

What is Transformers in NLP?

Concepts

Attention

Videos

Papers

Attention is All You Need

IndoNLU