目标检测 - SNIPER-Efficient Multi-Scale Training - 论文笔记
SNIPER:
-
adaptively samples chips from multiple scales of an image pyramid, conditioned on the image content.
-
We sample positive chips conditioned on the ground-truth instances and negative chips based on proposals generated by a region proposal network.
- negative chips 竟然是用 RPN 产生的,真神奇
-
R-CNN 是 scale invariant 的 (with the assumption that CNNs can classify images of a fixed resolution)
- 应该是把每个 Proposal 都 resize 成 a canonical 224x224 size image
-
Fast-RCNN 就不是 scale invariant 的了
- However, convolution for objects of different sizes is performed at a single scale, which destroys the scale invariance properties of R-CNN
- R-CNN 是因为把所有 object proposal 不管原大小都 resize 成了一样的大小 224x224,强制所有目标都在一个 Resolution 和 scale,所以才有 scale invariant,而 Fast-RCNN 没有这么做
Fast-RCNN 的缺点:
- in multi-scale training, Fast-RCNN upsamples and downsamples every proposal (whether small or big) in the image,这会导致本来就是 large 的 objects 还是会被 upsample 成 extreme large objects,本来是 small 的 objects 也会被 down-sample 成 extreme small objects
R-CNN 的优点:
- each proposal is resized to a canonical size of 224x224 pixels. Large objects are not upsampled and small objects are not downsampled in R-CNN.
SNIPER 就是两个好处都要要:we propose SNIPER, which retains the benefits of both these approaches by generating scale specific context-regions (chips) that cover maximum proposals at a particular scale.
R-CNN more appropriately does not up/downsample every pixel in the image but only in those regions which are likely to contain objects to an appropriate resolution. However, R-CNN does not share the convolutional features for nearby proposals like Fast-RCNN, which makes it slow.
两个好处是,R-CNN 的 contain objects to an appropriate resolution,和 Fast R-CNN 的 share the convolutional features for nearby proposals,which makes it fast