layer | resnet18 | resnet34 | resnet50 | resnet101 |
---|---|---|---|---|
conv1 | 7 | 7 | 7 | 7 |
maxpool | 11 | 11 | 11 | 11 |
layer1 | 43 | 59 | 35 | 35 |
layer2 | 99 | 179 | 91 | 91 |
layer3 | 211 | 547 | 267 | 811 |
layer4 | 435 | 899 | 427 | 971 |
This is calculated on ResNet-V2 which does a stride 2 convolution on 3x3 kernel.
Thanks for your response! I think that this difference in interpretation of architecture indeed explains the difference between our calculated RF's. With your assumption, I indeed get 91 and 267 for resnet50 layer2 and layer3 respectively, and assume that this will hold for all other networks/layers.
Though I do think that stride 2 convolution is performed on the first 1x1 convolution of each stage, rather than on the first 3x3 convolution of each stage. This is based on code inspection of the
keras-applications
implementation.In lines L236-L254 we see that each stage starts with a
conv_block
followed by multipleidentity_block
s.And in L114 we see that the first layer in
conv_block
is a 1x1 convolution with stride (2, 2). The first (or second, depending on indexing :)) stage is the only exception to this, as this already follows 3x3 max-pooling with stride 2.