Skip to content

Instantly share code, notes, and snippets.

@sizhky
Last active April 11, 2020 14:19
Show Gist options
  • Save sizhky/21b2185d73143fd1a8f48c0f06438adc to your computer and use it in GitHub Desktop.
Save sizhky/21b2185d73143fd1a8f48c0f06438adc to your computer and use it in GitHub Desktop.

Inception Networks. Role of Auxiliary loss in Training and Inference

  • Deep networks suffer the problem of vanishing gradient, i.e., gradients near to the loss layer are much larger and as they get distributed to the previous layers backward, the magnitude of gradients diminish.

Training

  • Inception tackles this problem by using the concept of auxiliary losses where (see fig) it branches out convolution outputs from 4a and 4d into their respective mini classification heads that employ standard softmax - predicting on the same classes as the main task

  • Hence during training, there will be a strong gradient flow from loss:0 to adjust layers 4a (and preceding layers) more aggressively. Similar is the case with layers "4d,4c and 4b" where the gradients are high from "loss:1" and for layers 4e onwards which are closer to "loss:2".

  • This way, during training, all the convolutional layers are actively learning from the losses that are coming from nearby layers. (Note that loss:2 is still influencing all the previous layers but it's just that the gradient contribution diminshes.)

  • All the three losses are combined in a weighted fashion before calling Loss.backward()

Loss = loss:2 + (0.3 * loss:1) + (0.3 * loss:0)

  • Such an exercise also encourages initial layers to learn better discriminative features as the gradeints are coming from multiple sources.

  • Due to the nature of ensemble learning (like random forests) this practice also functions as added regularization.

Testing

  • During prediction one has the choice of using softmaxes from losses 0, 1 and 2 as a way of ensembling the predictions and taking a weighted vote. LeNet has opted to use only the last softmax layer.

References:
https://stats.stackexchange.com/a/274623 https://pdfs.semanticscholar.org/0b99/d677883883584d9a328f6f2d54738363997a.pdf Slide 33 https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202 Implementation Code Snippet - https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958/9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment