Bag of Tricks for Efficient Text Classification

Introduction

Introduces fastText, a simple and highly efficient approach for text classification.
At par with deep learning models in terms of accuracy though an order of magnitude faster in performance.
Link to the paper
Link to code

Built on top of linear models with a rank constraint and a fast loss approximation.
Start with word representations that are averaged into text representation and feed them to a linear classifier.
Think of text representation as a hidden state that can be shared among features and classes.
Softmax layer to obtain a probability distribution over pre-defined classes.
High computational complexity O(kh), k is the number of classes and h is dimension of text representation.

Based on Huffman Coding Tree
Used to reduce complexity to O(hlog(k))
Top T results (from the tree) can be computed efficiently O(logT) using a binary heap.

Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
Uses hashing trick to maintain fast and memory efficient mapping of the n-grams.