| layer | resnet18 | resnet34 | resnet50 | resnet101 |
|---|---|---|---|---|
| conv1 | 7 | 7 | 7 | 7 |
| maxpool | 11 | 11 | 11 | 11 |
| layer1 | 43 | 59 | 35 | 35 |
| layer2 | 99 | 179 | 91 | 91 |
| layer3 | 211 | 547 | 267 | 811 |
| layer4 | 435 | 899 | 427 | 971 |
This is calculated on ResNet-V2 which does a stride 2 convolution on 3x3 kernel.
Thanks for this!
Thank you. Would love to know how you calculate these! Please let me know if you have any references.
Could you elaborate on how you got to these numbers?
I tried calculating (some of) them myself and found slightly different numbers.
Specifically, for resnet50 I agree on layer1 features having a receptive field of 35. But for layer2 I already get a different number based on formulas provided by https://distill.pub/2019/computing-receptive-fields/.
layer2 (which I will call stage 3 or C3 from now on) contains 4 convolutional layers with a kernel of 3x3. Thus a single feature point has its receptive field increased by 2, four times. This makes the receptive field of a feature point in C3 9 pixels with respect to the first feature layer in stage 3 (in a single dimension).
Between C2 and C3 there is a dimensionality reduction, which is accomplished by a 1x1 kernel with stride 2. Thus this increases the receptive field from 9 to (9*2 - 1) = 17.
We now have 3 convolutional layers in stage 2 with a kernel of 3x3, which increases the receptive field from 17 to (17 + (3*2))=23.
From C2 to C1 we have to account for a dimensionality reduction again, but this time with a kernel of 3x3 which makes the receptive field go from 23 to ((23*2 - 1) + 2)=47.
Finally, we have to account for a kernel of 7x7 with stride 2 again. This makes the perceptive field go from 47 to ((47*2 - 1) + 6)=99.
Analogously I found C5, the final convolutional layer, to have a perceptive field of 483, which is in agreement with https://github.com/google-research/receptive_field
Would you/anyone be able to point out where I made a mistake in my calculations or made a false assumption?
@kriskorrel-cw In my computation, the stride 2 is on 3x3 conv. This maybe the reason that leads to the difference if you compute the RF on the network do stride on conv 1x1.
@kriskorrel-cw In my computation, the stride 2 is on 3x3 conv. This maybe the reason that leads to the difference if you compute the RF on the network do stride on conv 1x1.
Thanks for your response! I think that this difference in interpretation of architecture indeed explains the difference between our calculated RF's. With your assumption, I indeed get 91 and 267 for resnet50 layer2 and layer3 respectively, and assume that this will hold for all other networks/layers.
Though I do think that stride 2 convolution is performed on the first 1x1 convolution of each stage, rather than on the first 3x3 convolution of each stage. This is based on code inspection of the keras-applications implementation.
In lines L236-L254 we see that each stage starts with a conv_block followed by multiple identity_blocks.
And in L114 we see that the first layer in conv_block is a 1x1 convolution with stride (2, 2). The first (or second, depending on indexing :)) stage is the only exception to this, as this already follows 3x3 max-pooling with stride 2.
@kriskorrel-cw Stride 2 on Conv1x1 is referred to ResNet-v1. And Stride 2 on Conv3x3 is V2 which performs better than v1.
Actually, there are plenty of versions of resnet, i.e. https://arxiv.org/abs/1812.01187 FYI.
@kriskorrel-cw Stride 2 on Conv1x1 is referred to ResNet-v1. And Stride 2 on Conv3x3 is V2 which performs better than v1.
Actually, there are plenty of versions of resnet, i.e. https://arxiv.org/abs/1812.01187 FYI.
Ah I see. Yeah I was basing my calculations on the original version. I don't know what the RF is for other versions.
Thanks, this is exactly what I needed.