OverFeat

CNN 能从最初的 Classification 扩展到 Detection、Segmentation 这样的任务的依据是人们认为 the most features learned in Convolutional layers are general purpose。

BBR 最早是在 DPM 中被引入，然后在 R-CNN 中被用。

这篇文章是最早探索怎么把原来用于分类的 CNN 用于 Object Detection 的文章之一，有很多很有启发的点，但个人感觉写得不是很容易懂，这里只摘录一点我读到的。

最早意识到 conv 是一个天生且 Efficient 的 Sliding Window Approach，efficient 是因为 they naturally share computations common to overlapping regions，在一个 sliding window 中除了最中心的那个像素以外，其余其他像素都是其他 sliding window 的中心像素，已经或者将会被计算，所以如果 sliding window 大小是 7 * 7 的话，相对于一个 conv share 计算的，sliding window Approach 理论上就是会慢上 49 倍。
最早开始探索怎么把原来用于分类的 CNN 用于 Object Detection 的方法，认为分类过程中，提取到的特征，特可以用于定位检测等各种任务，只要改变下最后几层，也就不需要从头训练整个网络了（Pretrained 的概念）
给出了一个叫做 offset max-pooling 的 exhaustive pooling scheme ensures that we can obtain fine alignment between the classifier and the representation of the object in the feature map. 这里就开始强调 alignment 了
Overfeat 是用 field of view 和 GT BBox 的 IoU 来判断 Positive or Negative 的，此时还没有 Anchor 的概念, field of view 也就是 receptive field
Overfeat 预测 BBox 的时候，The regression then predicts the location scale of the object with respect to each window，就跟 YOLO 一样，也是预测相对于当前 Cell/Window 的

YimianDai/OverFeat.md