遥感图像的 Semantic Segmentation 中有 small object 的问题,比如 car
用的是 ISPRS 2D Semantic Labeling Contest 中的数据,输入数据有 5 个通道,我想应该是 RGB+NIR+DSM。
用 65 × 65 pixel 的 patch 来分类,中心像素的类别就是这个 patch 的类别。
Conv 1: In_Channel = 5, Spatial = 5 x 5, Out_Channel = 32,用 k_w * k_h * c_in * c_out 来标记,那么就是 5 * 5 * 5 * 32
Conv 1 (5×5×5x32) + ReLU + BN + 3×3 max-pooling layer (stride = 1) Conv 2: Out_Channel = 64, 5×5×32x64
Conv 3: Out_Channel = 96, 5×5×64x96
Conv 4: Out_Channel = 128, 5×5×96x128
- Conv 1 (5 × 5 × 5 x 32) + ReLU + BN + 3 × 3 max-pooling layer (stride = 1)
- Conv 2 (5 × 5 × 32 x 64) + ReLU + BN + 3 × 3 max-pooling layer (stride = 1)
- Conv 3 (5 × 5 × 64 x 96) + ReLU + BN + 3 × 3 max-pooling layer (stride = 1)
- Conv 4 (5 × 5 × 96 x 128) + ReLU + BN + 3 × 3 max-pooling layer (stride = 1)
- FC (128) + Dropout (0.5)
- FC (5) + Dropout (0.5)
- Softmax (5) 注意 max-pooling 的 stride = 1 是为了避免 down-sampling
输入图像大小:256 x 256
- Layer 1
- Conv1_1 (3 x 3, stride = 2) + ReLU + BN
- Conv1_2 (3 x 3, stride = 1) + ReLU + BN
- 2 × 2 max pooling (stride = 2)
- Layer 2
- Conv2_1 (3 x 3, stride = 1) + ReLU + BN
- Conv2_2 (3 x 3, stride = 1) + ReLU + BN
- 2 × 2 max pooling (stride = 2)
- Layer 3
- Conv3_1 (3 x 3, stride = 1) + ReLU + BN
- Conv3_2 (3 x 3, stride = 1) + ReLU + BN
- 2 × 2 max pooling (stride = 2)
- Layer 4
- Conv4_1 (3 x 3, stride = 1) + ReLU + BN
- Conv4_2 (3 x 3, stride = 1) + ReLU + BN
- Conv4_3 (1 x 1, Out = nclass) + ReLU + BN
- Trans Conv 1
- Trans Conv 2
- Softmax
这网络还是 16 倍的下采样。
Data Augmentation: 50 % Overlap, flip (left to right and up down),rotated at 90 degree intervals (3 个),怎么做到 8 Augmentations?保持不变,left to right 和 up down 这样 3 个,然后 rotate 可以 90, 180, 270 3 个方向,所以一共 9 个吧
Loss 其实就是一个 Weighted Cross Entropy Loss, $$ L=-\frac{1}{N} \sum_{n=1}^{N} \sum_{c=1}^{C} l_{c}^{(n)} \log \left(\hat{p}{c}^{(n)}\right) w{c} $$
其中权重为