SNIPER

目标检测 - SNIPER-Efficient Multi-Scale Training - 论文笔记

SNIPER：

adaptively samples chips from multiple scales of an image pyramid, conditioned on the image content.
We sample positive chips conditioned on the ground-truth instances and negative chips based on proposals generated by a region proposal network.
- negative chips 竟然是用 RPN 产生的，真神奇
R-CNN 是 scale invariant 的 (with the assumption that CNNs can classify images of a fixed resolution)
- 应该是把每个 Proposal 都 resize 成 a canonical 224x224 size image
Fast-RCNN 就不是 scale invariant 的了
- However, convolution for objects of different sizes is performed at a single scale, which destroys the scale invariance properties of R-CNN
- R-CNN 是因为把所有 object proposal 不管原大小都 resize 成了一样的大小 224x224，强制所有目标都在一个 Resolution 和 scale，所以才有 scale invariant，而 Fast-RCNN 没有这么做

Fast-RCNN 的缺点：

in multi-scale training, Fast-RCNN upsamples and downsamples every proposal (whether small or big) in the image，这会导致本来就是 large 的 objects 还是会被 upsample 成 extreme large objects，本来是 small 的 objects 也会被 down-sample 成 extreme small objects

R-CNN 的优点：

each proposal is resized to a canonical size of 224x224 pixels. Large objects are not upsampled and small objects are not downsampled in R-CNN.

SNIPER 就是两个好处都要要：we propose SNIPER, which retains the beneﬁts of both these approaches by generating scale speciﬁc context-regions (chips) that cover maximum proposals at a particular scale.

R-CNN more appropriately does not up/downsample every pixel in the image but only in those regions which are likely to contain objects to an appropriate resolution. However, R-CNN does not share the convolutional features for nearby proposals like Fast-RCNN, which makes it slow.

两个好处是，R-CNN 的 contain objects to an appropriate resolution，和 Fast R-CNN 的 share the convolutional features for nearby proposals，which makes it fast

YimianDai/SNIPER.md