##Information
name: 16-layer model from the arXiv paper: "Very Deep Convolutional Networks for Large-Scale Image Recognition"
caffemodel: VGG_ILSVRC_16_layers
caffemodel_url: http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
license: see http://www.robots.ox.ac.uk/~vgg/research/very_deep/
caffe_version: trained using a custom Caffe-based framework
gist_id: 211839e770f7b538e2d8
The model is an improved version of the 16-layer model used by the VGG team in the ILSVRC-2014 competition. The details can be found in the following arXiv paper:
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556
Please cite the paper if you use the model.
In the paper, the model is denoted as the configuration D trained with scale jittering. The input images should be zero-centered by mean pixel (rather than mean image) subtraction. Namely, the following BGR values should be subtracted: [103.939, 116.779, 123.68].
The models are currently supported by the dev branch of Caffe, but are not yet compatible with master.
An example of how to use the models in Matlab can be found in matlab/caffe/matcaffe_demo_vgg.m
Using dense single-scale evaluation (the smallest image side rescaled to 384), the top-5 classification error on the validation set of ILSVRC-2012 is 8.1% (see Table 3 in the arXiv paper).
Using dense multi-scale evaluation (the smallest image side rescaled to 256, 384, and 512), the top-5 classification error is 7.5% on the validation set and 7.4% on the test set of ILSVRC-2012 (see Table 4 in the arXiv paper).
I also came looking for the train_val file for convenience, which doesn't seem to be available. I wrote this one for ImageNet, in case it's helpful to others: http://cs.stanford.edu/people/karpathy/vgg_train_val.prototxt
Note that I've zerod out all blobs_lr everywhere - presumably you want to build on this.
Set up this way and running this with
I get, on validation set:
Test net output #0: accuracy@1 = 0.683579
Test net output #1: accuracy@5 = 0.884442
Test net output #2: loss/loss = 1.30089 (* 1 = 1.30089 loss)
(i.e. top5 val error is 11.5% with no bells and whistles and single forward pass).
Swapping out the path to val lmdb to train lmdb, we can see also on training set:
Test net output #0: accuracy@1 = 0.77418
Test net output #1: accuracy@5 = 0.938281
Test net output #2: loss/loss = 0.852522 (* 1 = 0.852522 loss)