- The paper introduces a novel technique to explain the predictions of any classifier in an interpretable and faithful manner.
- It also proposes a method to explain models by obtaining representative individual predictions and their explanations.
- Link to the paper
- Demo
-
Interpretable
- Take into account user limitations.
- Since features in a machine learning model need not be interpretable, the input to the explanations may have to be different from input to the model.
-
Local Fidelity
- Explanation should be locally faithful, ie it should correspond to how the model behaves in the vicinity of the instance being predicted.
-
Model Agnostic
- Treat the original, given model as a black box.
-
Global Perspective
- Select a few predictions such that they represent the entire model.
-
Local Interpretable Model-agnostic Explanations
-
Interpretable Data Representations
- For text classification, an interpretable representation could be a binary vector indicating the presence or absence of a word (or bag or words).
- For image classification, an interpretable representation may be a binary vector indicating the "presence" of a super-pixel.
- x ∈ Rd is the original representation of an instance being explained while x` ∈ {0, 1}d denotes a binary vector for its representation.
-
Fidelity-Interpretability Trade-off
- Define an explanation as a model g ∈ G, where G is a class of potentially interpretable models and g acts over absence/presence of the interpretable components
- Define Ω(g) as a measure of complexity (as opposed to interpretability) of the explanation g ∈ G.
- Define f to be the model being explained.
- Define πx(z) as a proximity measure between an instance z to x (to define locality around x).
- Define L(f, g, πx) as a measure of how unfaithful g is in approximating f in the locality defined by πx.
- To ensure both interpretability and local fidelity, we minimise L(f, g, πx) while having Ω(g) be low enough to be interpretable.
-
Sampling for Local Exploration
- Since f is treated as a black box, the local behaviour of L(f, g, πx) is approximated by drawing samples weighted by πx.
- Given an instance x`, generate a dataset of perturbed samples Z and optimise the LIME model loss, L(f, g, πx).
- The paper proposes to use sparse linear explanations with the locally weighted square loss as L. This could be a problem in the case of highly non-linear models.
- Global understanding of the model by explaining a set of individual instances.
- Define B to the number of explanations to be generated.
- Pick Step - the task of selecting B instances for the user to inspect.
- Aim to obtain non-redundant explanations that represent how the model behaves globally.
- Given a matrix of n explanations, using d features (also called explanation matrix), rank the features such that the feature which explains more instances gets a higher score.
- When selecting instances, avoid instances with similar explanation and try to increase coverage.
- The paper evaluates its approach on a series of simulated and human-in-the-loop tasks to check:
- Are explanations faithful to the model.
- Could the predictions be trusted.
- Can the model be trusted.
- Can users select the best classifier given the explanations.
- Can user (non-experts) improve the classifier by means of feature selection.
- Can explanations lead to insights about the model itself.
- Need to define a way of finding (and ranking) compatible features across images for SP-LIME.
- It could be difficult to define the relevant features for model explanation in certain cases - for example, single words may not be a good feature in sentiment analysis models.