layer | resnet18 | resnet34 | resnet50 | resnet101 |
---|---|---|---|---|
conv1 | 7 | 7 | 7 | 7 |
maxpool | 11 | 11 | 11 | 11 |
layer1 | 43 | 59 | 35 | 35 |
layer2 | 99 | 179 | 91 | 91 |
layer3 | 211 | 547 | 267 | 811 |
layer4 | 435 | 899 | 427 | 971 |
This is calculated on ResNet-V2 which does a stride 2 convolution on 3x3 kernel.
Could you elaborate on how you got to these numbers?
I tried calculating (some of) them myself and found slightly different numbers.
Specifically, for resnet50 I agree on layer1 features having a receptive field of 35. But for layer2 I already get a different number based on formulas provided by https://distill.pub/2019/computing-receptive-fields/.
layer2 (which I will call stage 3 or C3 from now on) contains 4 convolutional layers with a kernel of 3x3. Thus a single feature point has its receptive field increased by 2, four times. This makes the receptive field of a feature point in C3 9 pixels with respect to the first feature layer in stage 3 (in a single dimension).
Between C2 and C3 there is a dimensionality reduction, which is accomplished by a 1x1 kernel with stride 2. Thus this increases the receptive field from 9 to (9*2 - 1) = 17.
We now have 3 convolutional layers in stage 2 with a kernel of 3x3, which increases the receptive field from 17 to (17 + (3*2))=23.
From C2 to C1 we have to account for a dimensionality reduction again, but this time with a kernel of 3x3 which makes the receptive field go from 23 to ((23*2 - 1) + 2)=47.
Finally, we have to account for a kernel of 7x7 with stride 2 again. This makes the perceptive field go from 47 to ((47*2 - 1) + 6)=99.
Analogously I found C5, the final convolutional layer, to have a perceptive field of 483, which is in agreement with https://github.com/google-research/receptive_field
Would you/anyone be able to point out where I made a mistake in my calculations or made a false assumption?