Fractional Max-Pooling (FMP)

Introduction

Link to Paper
Spatial pooling layers are building blocks for Convolutional Neural Networks (CNNs).
Input to pooling operation is a N_in x N_in matrix and output is a smaller matrix N_out x N_out.
Pooling operation divides N_in x N_in square into N²_out pooling regions P_{i, j}.
P_{i, j} ⊂ {1, 2, . . . , N_in} ∀ (i, j) ∈ {1, . . . , N_out}²

MP2

Refers to 2x2 max-pooling layer.
Popular choice for max-pooling operation.

Advantages of MP2

Fast.
Quickly reduces the size of the hidden layer.
Encodes a degree of invariance with respect to translations and elastic distortions.

Issues with MP2

Disjoint nature of pooling regions.
Since size decreases rapidly, stacks of back-to-back CNNs are needed to build deep networks.

FMP

Reduces the spatial size of the image by a factor of α, where α ∈ (1, 2).
Introduces randomness in terms of choice of pooling region.
Pooling regions can be chosen in a random or pseudorandom manner.
Pooling regions can be disjoint or overlapping.

Generating Pooling Regions

Let a_i and b_i be 2 increasing sequences of integers, starting at 1 and ending at N_in.
Increments are either 1 or 2.
For disjoint regions, P = [a_i−1, a_i − 1] × [b_j−1, b_j − 1]
For overlapping regions, P = [a_i−1, a_i] × [b_j−1, b_j1]
Pooling regions can be generated randomly by choosing the increment randomly at each step.
To generate pooling regions in a peusdorandom manner, choose a_i = ceil(α*(i+u)), where α ∈ (1, 2) with some u ∈ (0, 1).
Each FMP layer uses a different pair of sequence.
An FMP network can be thought of as an ensemble of similar networks, with each different pooling-region configuration defining a different member of the ensemble.

Observations

Random FMP is good on its own but may underfit when combined with dropout or training data augmentation.
Pseudorandom approach generates more stable pooling regions.
Overlapping FMP performs better than disjoint FMP.

Weakness

No justification is provided for the observations mentioned above.
It needs to be seen how performance is affected if the pooling layer in architectures like GoogLeNet.

To take an example, suppose you have a 100x100 grid. You start at the coordinates (0, 0) and maintain two arrays, one for each dimension of the original grid. The values in these arrays would define our pooling regions. Since the increments can either be 1 or 2, the problem is how to generate each of these arrays (basically how to select between the two possible values of increment for each array).

One approach is where we toss a coin every time and use the result of the coin to generate a random sequence of 1's and 2's. This sequence becomes is used to set the values of the two arrays. Since the sequence itself was generated randomly, we call it the random case.

In the Pseduo-random case, we generate the elements (for the arrays mentioned in the first paragraph) in a pseudo-random manner by using the equation:
ith element of array = ceiling(α(i + u)) where α ∈ (1, 2), with some u ∈ (0, 1). Note that α is same as the pooling factor. Also, in this case, we directly generate the elements for the array and not for the sequence.

Hope that helps.

shagunsodhani/FMP.md