Deep Learning for Face Recognition (May 2016)

Popular architectures

FaceNet (Google)
- They use a triplet loss with the goal of keeping the L2 intra-class distances low and inter-class distances high
DeepID (Hong Kong University)
- They use verification and identification signals to train the network. Afer each convolutional layer there is an identity layer connected to the supervisory signals in order to train each layer closely (on top of normal backprop)
DeepFace (Facebook)
- Convs followed by locally connected, followed by fully connected

Face databases

The MegaFace Benchmark: 1 Million Faces for Recognition at Scale CVPR 2016

Face Recognition

Learning Deep Representation for Imbalanced Classification CVPR 2016
Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition CVPR 2016
Sparsifying Neural Network Connections for Face Recognition CVPR 2016
Pose-Aware Face Recognition in the Wild CVPR 2016
Do We Really Need to Collect Millions of Faces for Effective Face Recognition? (2016)
- A method for training data augumentation is porposed as alternative to the manual harvesting and labeling of millions (up to 200M!) of face images recently used to achieve the top results in LFW (Google, Facebook etc)
- Their data augmentation goes beyond traditional techniques known to work well for deep learning such oversampling by cropping and shifting multiple times each original image, mirroring, rotating, etc. Insted they use domain specific data augumentation and generate new samples by varying pose, shape, and expression.
Baidu's Targeting Ultimate Accuracy: Face Recognition via Deep Embedding (2015)
- Similar approach to FaceNet
- Multi-patch deep CNN followed by deep metric learning using triplet loss
- Increasing the number of images from 150K to 1.2M reduces the error rate from 3.1% to 0.87%
- Increasing the number of patches from 1 to 7 reduces the error rate from 0.87% to 0.32%. Increasing the number of patches further does not improve the results, actually makes the error rate slightly worse.
Google's FaceNet: A Unified Embedding for Face Recognition and Clustering (2015)
- They use a triplet loss with the goal of keeping the L2 intra-class distances low and inter-class distances high
Network In Network (2014)
- "In NIN, the GLM is replaced with a ”micro network” structure which is a general nonlinear function approximator"
- The fully connected layers (classifiers) at the end of the convolution layers (feature extractors) are replaced by a global average pooling layer, i.e., the last mlpconv layer produces one feature map per class which is then followed by a softmax layer
  - This approach has two advantages: 1) It has better generalization properties than the traditional fully connected layer which often suffers from overfitting (which is normally sorted using droppout) 2) Each output feature map is basically a confidence map for one class which is very intuintive and meaningfull
- The micro network chosen in this paper is a multilayer perceptron
- Has advantages over the traditional fully connected layers at the output
CP-mtML: Coupled Projection multi-task Metric Learning for Large Scale Face Retrieval CVPR 2016

Face alignment

Fast face recognition and verification

Real-Time Face Identification via CNN and Boosted Hashing Forest CVPR 2016

jdsgomes/DeepLearningFaces.md

Deep Learning for Face Recognition (May 2016)

Popular architectures

Face databases

Face Recognition

Face alignment

Fast face recognition and verification