Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Created January 28, 2017 12:32
Show Gist options
  • Save shagunsodhani/bd744ab6c17a2289ca139ea586d1d65e to your computer and use it in GitHub Desktop.
Save shagunsodhani/bd744ab6c17a2289ca139ea586d1d65e to your computer and use it in GitHub Desktop.
Summary of "Why Should I Trust You? Explaining the Predictions of Any Classifier" paper

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Introduction

  • The paper introduces a novel technique to explain the predictions of any classifier in an interpretable and faithful manner.
  • It also proposes a method to explain models by obtaining representative individual predictions and their explanations.
  • Link to the paper
  • Demo

Desired Characteristics for Explanations

  • Interpretable

    • Take into account user limitations.
    • Since features in a machine learning model need not be interpretable, the input to the explanations may have to be different from input to the model.
  • Local Fidelity

    • Explanation should be locally faithful, ie it should correspond to how the model behaves in the vicinity of the instance being predicted.
  • Model Agnostic

    • Treat the original, given model as a black box.
  • Global Perspective

    • Select a few predictions such that they represent the entire model.

LIME

  • Local Interpretable Model-agnostic Explanations

  • Interpretable Data Representations

    • For text classification, an interpretable representation could be a binary vector indicating the presence or absence of a word (or bag or words).
    • For image classification, an interpretable representation may be a binary vector indicating the "presence" of a super-pixel.
    • x ∈ Rd is the original representation of an instance being explained while x` ∈ {0, 1}d denotes a binary vector for its representation.
  • Fidelity-Interpretability Trade-off

    • Define an explanation as a model g ∈ G, where G is a class of potentially interpretable models and g acts over absence/presence of the interpretable components
    • Define Ω(g) as a measure of complexity (as opposed to interpretability) of the explanation g ∈ G.
    • Define f to be the model being explained.
    • Define πx(z) as a proximity measure between an instance z to x (to define locality around x).
    • Define L(f, g, πx) as a measure of how unfaithful g is in approximating f in the locality defined by πx.
    • To ensure both interpretability and local fidelity, we minimise L(f, g, πx) while having Ω(g) be low enough to be interpretable.
  • Sampling for Local Exploration

    • Since f is treated as a black box, the local behaviour of L(f, g, πx) is approximated by drawing samples weighted by πx.
    • Given an instance x`, generate a dataset of perturbed samples Z and optimise the LIME model loss, L(f, g, πx).
    • The paper proposes to use sparse linear explanations with the locally weighted square loss as L. This could be a problem in the case of highly non-linear models.

Submodular Pick for Explaining Models

  • Global understanding of the model by explaining a set of individual instances.
  • Define B to the number of explanations to be generated.
  • Pick Step - the task of selecting B instances for the user to inspect.
  • Aim to obtain non-redundant explanations that represent how the model behaves globally.
  • Given a matrix of n explanations, using d features (also called explanation matrix), rank the features such that the feature which explains more instances gets a higher score.
  • When selecting instances, avoid instances with similar explanation and try to increase coverage.

Conclusion

  • The paper evaluates its approach on a series of simulated and human-in-the-loop tasks to check:
    • Are explanations faithful to the model.
    • Could the predictions be trusted.
    • Can the model be trusted.
    • Can users select the best classifier given the explanations.
    • Can user (non-experts) improve the classifier by means of feature selection.
    • Can explanations lead to insights about the model itself.

Future Work

  • Need to define a way of finding (and ranking) compatible features across images for SP-LIME.
  • It could be difficult to define the relevant features for model explanation in certain cases - for example, single words may not be a good feature in sentiment analysis models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment