这篇文章依赖更长的 arXiv 版本
Szegedy, Christian, et al. "Scalable, high-quality object detection." arXiv preprint arXiv:1412.1441 (2014).
会议版本
Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2147-2154
只是一个 bbox generator,产生的 bbox 是被认为会包含有 Object 的 bbox,其实就是 Region Proposal Network 啦,Learning to Propose Regions
是 Proposal Generation Model
哈哈,作者自己论文里说了
Our work builds upon the MultiBox approach presented in [4], which was an earlier attempt to learn a proposal generation model but was never directly competitive with the best expert-engineered alternatives.
那看来是之前的会议版本的 Proposal Generation 效果不好啊,为啥不好呢?
在新的版本里,作者用了 latest Inception-Style architecture,multi-scale Convolutional predictors of bounding box shape and confidence 就好了,难怪要在摘要里把这个叫做 multi-scale convolutional MultiBox (MSC-MultiBox),突出个 multi-scale
RPN 是 Anchor 机制,MultiBox 也是 Anchor 机制,两个有啥差别呢?论文自己都说了,The biggest similarity is the usage of priors (called “anchors in the Fast R-CNN work [8]),区别是:
- MultiBox 用了 multiple tapering layers(指的就是 Fig. 2),而 RPN predicting boxes of many scales from a single feature map
- MultiBox 的 confidences 是 class-agnostic 的,难道 RPN 是预测 class label 的么?也不是吧?
- Regression and Classification loss 不一样,真的不一样么?
- 网络结构不一样
class agnostic