shagunsodhani/Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.md

Created October 2, 2016 12:52

Star (1) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/shagunsodhani/f48da7f77418aa22751ffed115779126.js"></script>
Save shagunsodhani/f48da7f77418aa22751ffed115779126 to your computer and use it in GitHub Desktop.

Download ZIP

Summary of "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps" paper

Raw

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.md

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Introduction

The paper presents gradient computation based techniques to visualise image classification models.
Link to the paper

Experimental Setup

Single deep convNet trained on ILSVRC-2013 dataset (1.2M training images and 1000 classes).
Weight layer configuration is: conv64-conv256-conv256-conv256-conv256-full4096-full4096-full1000.

Class Model Visualisation

Given a learnt ConvNet and a class (of interest), start with the zero image and perform optimisation by back propagating with respect to the input image (keeping the ConvNet weights constant).
Add the mean image (for training set) to the resulting image.
The paper used unnormalised class scores so that optimisation focuses on increasing the score of target class and not decreasing the score of other classes.

Image-Specific Class Saliency Visualisation

Given an image, class of interest, and trained ConvNet, rank the pixels of the input image based on their influence on class scores.
Derivative of the class score with respect to image gives an estimate of the importance of different pixels for the class.
The magnitude of derivative also indicated how much each pixel needs to be changed to improve the class score.

Class Saliency Extraction

Find the derivative of the class score with respect with respect to the input image.
This would result in one single saliency map per colour channel.
To obtain a single saliency map, take the maximum magnitude of derivative across all colour channels.

Weakly Supervised Object Localisation

The saliency map for an image provides a rough encoding of the location of the object of the class of interest.
Given an image and its saliency map, an object segmentation map can be computed using GraphCut colour segmentation.
Color continuity cues are needed as saliency maps might capture only the most dominant part of the object in the image.
This weakly supervised approach achieves 46.4% top-5 error on the test set of ILSVRC-2013.

Relation to Deconvolutional Networks

DeconvNet-based reconstruction of the n-th layer input is similar to computing the gradient of the visualised neuron activity f with respect to the input layer.
One difference is in the way RELU neurons are treated:
- In DeconvNet, the sign indicator (for the derivative of RELU) is computed on output reconstruction while in this paper, the sign indicator is computed on the layer input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment