- Introduces fastText, a simple and highly efficient approach for text classification.
- At par with deep learning models in terms of accuracy though an order of magnitude faster in performance.
- Link to the paper
- Link to code
- Built on top of linear models with a rank constraint and a fast loss approximation.
- Start with word representations that are averaged into text representation and feed them to a linear classifier.
- Think of text representation as a hidden state that can be shared among features and classes.
- Softmax layer to obtain a probability distribution over pre-defined classes.
- High computational complexity O(kh), k is the number of classes and h is dimension of text representation.
- Based on Huffman Coding Tree
- Used to reduce complexity to O(hlog(k))
- Top T results (from the tree) can be computed efficiently O(logT) using a binary heap.
- Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
- Uses hashing trick to maintain fast and memory efficient mapping of the n-grams.
- fastText benefits by using bigrams.
- Outperforms char-CNN and char-CRNN and performs a bit worse than VDCNN.
- Order of magnitudes faster in terms of training time.
- Note: fastText does not use pre-trained word embeddings.
- fastText with bigrams outperforms Tagspace.
- fastText performs upto 600 times faster at test time.