samson-wang/rfs.md

Last active March 3, 2024 12:15

Star (15) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/samson-wang/a6073c18f2adf16e0ab5fb95b53db3e6.js"></script>
Save samson-wang/a6073c18f2adf16e0ab5fb95b53db3e6 to your computer and use it in GitHub Desktop.

ResNet Receptive Field Size

Raw

layer	resnet18	resnet34	resnet50	resnet101
conv1	7	7	7	7
maxpool	11	11	11	11
layer1	43	59	35	35
layer2	99	179	91	91
layer3	211	547	267	811
layer4	435	899	427	971

This is calculated on ResNet-V2 which does a stride 2 convolution on 3x3 kernel.

kriskorrel-cw commented Dec 7, 2020 •

edited

Loading

Could you elaborate on how you got to these numbers?
I tried calculating (some of) them myself and found slightly different numbers.

Specifically, for resnet50 I agree on layer1 features having a receptive field of 35. But for layer2 I already get a different number based on formulas provided by https://distill.pub/2019/computing-receptive-fields/.
layer2 (which I will call stage 3 or C3 from now on) contains 4 convolutional layers with a kernel of 3x3. Thus a single feature point has its receptive field increased by 2, four times. This makes the receptive field of a feature point in C3 9 pixels with respect to the first feature layer in stage 3 (in a single dimension).

Between C2 and C3 there is a dimensionality reduction, which is accomplished by a 1x1 kernel with stride 2. Thus this increases the receptive field from 9 to (9*2 - 1) = 17.

We now have 3 convolutional layers in stage 2 with a kernel of 3x3, which increases the receptive field from 17 to (17 + (3*2))=23.

From C2 to C1 we have to account for a dimensionality reduction again, but this time with a kernel of 3x3 which makes the receptive field go from 23 to ((23*2 - 1) + 2)=47.

Finally, we have to account for a kernel of 7x7 with stride 2 again. This makes the perceptive field go from 47 to ((47*2 - 1) + 6)=99.

Analogously I found C5, the final convolutional layer, to have a perceptive field of 483, which is in agreement with https://github.com/google-research/receptive_field

Would you/anyone be able to point out where I made a mistake in my calculations or made a false assumption?

Author

samson-wang commented Dec 9, 2020

@kriskorrel-cw In my computation, the stride 2 is on 3x3 conv. This maybe the reason that leads to the difference if you compute the RF on the network do stride on conv 1x1.

kriskorrel-cw commented Dec 9, 2020

@kriskorrel-cw In my computation, the stride 2 is on 3x3 conv. This maybe the reason that leads to the difference if you compute the RF on the network do stride on conv 1x1.

Thanks for your response! I think that this difference in interpretation of architecture indeed explains the difference between our calculated RF's. With your assumption, I indeed get 91 and 267 for resnet50 layer2 and layer3 respectively, and assume that this will hold for all other networks/layers.

Though I do think that stride 2 convolution is performed on the first 1x1 convolution of each stage, rather than on the first 3x3 convolution of each stage. This is based on code inspection of the keras-applications implementation.
In lines L236-L254 we see that each stage starts with a conv_block followed by multiple identity_blocks.
And in L114 we see that the first layer in conv_block is a 1x1 convolution with stride (2, 2). The first (or second, depending on indexing :)) stage is the only exception to this, as this already follows 3x3 max-pooling with stride 2.

Author

samson-wang commented Dec 9, 2020

@kriskorrel-cw Stride 2 on Conv1x1 is referred to ResNet-v1. And Stride 2 on Conv3x3 is V2 which performs better than v1.

Actually, there are plenty of versions of resnet, i.e. https://arxiv.org/abs/1812.01187 FYI.

kriskorrel-cw commented Dec 9, 2020

@kriskorrel-cw Stride 2 on Conv1x1 is referred to ResNet-v1. And Stride 2 on Conv3x3 is V2 which performs better than v1.

Actually, there are plenty of versions of resnet, i.e. https://arxiv.org/abs/1812.01187 FYI.

Ah I see. Yeah I was basing my calculations on the original version. I don't know what the RF is for other versions.

samson-wang/rfs.md

kriskorrel-cw commented Dec 7, 2020 • edited Loading

samson-wang commented Dec 9, 2020

kriskorrel-cw commented Dec 9, 2020

samson-wang commented Dec 9, 2020

kriskorrel-cw commented Dec 9, 2020

kriskorrel-cw commented Dec 7, 2020 •

edited

Loading