- FaceNet (Google)
- They use a triplet loss with the goal of keeping the L2 intra-class distances low and inter-class distances high
- DeepID (Hong Kong University)
- They use verification and identification signals to train the network. Afer each convolutional layer there is an identity layer connected to the supervisory signals in order to train each layer closely (on top of normal backprop)
- DeepFace (Facebook)
- Convs followed by locally connected, followed by fully connected
- Learning Deep Representation for Imbalanced Classification CVPR 2016
- Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition CVPR 2016
- Sparsifying Neural Network Connections for Face Recognition CVPR 2016
- Pose-Aware Face Recognition in the Wild CVPR 2016
- Do We Really Need to Collect Millions of Faces for Effective Face Recognition? (2016)
- A method for training data augumentation is porposed as alternative to the manual harvesting and labeling of millions (up to 200M!) of face images recently used to achieve the top results in LFW (Google, Facebook etc)
- Their data augmentation goes beyond traditional techniques known to work well for deep learning such oversampling by cropping and shifting multiple times each original image, mirroring, rotating, etc. Insted they use domain specific data augumentation and generate new samples by varying pose, shape, and expression.
- Baidu's Targeting Ultimate Accuracy: Face Recognition via Deep Embedding (2015)
- Similar approach to FaceNet
- Multi-patch deep CNN followed by deep metric learning using triplet loss
- Increasing the number of images from 150K to 1.2M reduces the error rate from 3.1% to 0.87%
- Increasing the number of patches from 1 to 7 reduces the error rate from 0.87% to 0.32%. Increasing the number of patches further does not improve the results, actually makes the error rate slightly worse.
- Google's FaceNet: A Unified Embedding for Face Recognition and Clustering (2015)
- They use a triplet loss with the goal of keeping the L2 intra-class distances low and inter-class distances high
- Network In Network (2014)
- "In NIN, the GLM is replaced with a ”micro network” structure which is a general nonlinear function approximator"
- The fully connected layers (classifiers) at the end of the convolution layers (feature extractors) are replaced by a global average pooling layer, i.e., the last mlpconv layer produces one feature map per class which is then followed by a softmax layer
- This approach has two advantages: 1) It has better generalization properties than the traditional fully connected layer which often suffers from overfitting (which is normally sorted using droppout) 2) Each output feature map is basically a confidence map for one class which is very intuintive and meaningfull
- The micro network chosen in this paper is a multilayer perceptron
- Has advantages over the traditional fully connected layers at the output
- CP-mtML: Coupled Projection multi-task Metric Learning for Large Scale Face Retrieval CVPR 2016