Skip to content

Instantly share code, notes, and snippets.

@FeiYeYe
Created July 20, 2015 01:40
Show Gist options
  • Select an option

  • Save FeiYeYe/46fcdd16981e2f341e41 to your computer and use it in GitHub Desktop.

Select an option

Save FeiYeYe/46fcdd16981e2f341e41 to your computer and use it in GitHub Desktop.
---
title: "You don't know partial dependence plot"
author: "Fei Ye"
date: "July, 2015"
output: beamer_presentation
---
Conversation starting like this usually doesn't end up quickly.
Q: I made a partial dependence plot and it looks awful and unintuitive, what did I do wrong?
A: As a (data) scientist, you probably need to define partial dependence plot first...
A general function $f(X) = f(X_S, X_C)$, where $X$ is $p$-dimensional, $S$ is a set in $\{1, 2, ..., p\}$, $C$ is the complement set:
$$ f_S(X_S) = E_{X_C}f(X_S, X_C) \approx \frac{1}{N}\sum_{i=1}^Nf(X_S, x_{iC})$$
It's important to note that partial dependence plot (PDP) defined represent the effect of $X_S$ on $f(X)$ after accounting for the average effects of the other variables $X_C$ on $f(X)$. They are not the effe
ct of $X_S$ on $f(X)$ conditioned on the effects of $X_C$, which is given by the individual conditional expectation (ICE):
$$ \bar{f}_S(X_S) = E(f(X_S,f_C)|X_S) $$
They are the same only if $X_S$ and $X_C$ are independent, for example, additive or multiplicative models:
$$ f(X) = h_1(X_S) + h_2(X_C) $$
$$ f(X) = h_1(X_S) \cdot h_2(X_C) $$
What I did for xgboost importance plot was actually an approximation of ICE:
https://github.com/FeiYeYe/xgboost/blob/master/R-package/R/plot.xgb.Booster.R#L30
Quick test: does $\beta$ in linear regression represents the effects before or after accounting for the effects of the other variables?
In fact, the original work introducing PDP argues that the PDP can be a useful summary for the chosen subset of variables if their dependence on the remaining features is not too strong. When the dependence
is strong, however that is, when interactions are present the PDP can be misleading. ICE plots are intended to address these issues.
Sometime the 1-d plot does not reveal the truth. Simulate $Y = 0.2X_1-5X_2+10X_2I_{X_3\ge 0}+\epsilon$, where $\epsilon \sim N(0,1)$ and $X_1,X_2,X_3 \sim U(-1,1)$.
```{r}
library(gbm)
n <- 1000
eps <- rnorm(n)
xs <- runif(3*n, min = -1, max = 1)
x1 <- xs[1:n]
x2 <- xs[(n+1):(2*n)]
x3 <- xs[(2*n+1):(3*n)]
Y <- 0.2 * x1 - 5 * x2 + 10 * x2 * ifelse(x3 >= 0, 1, 0) + eps
plot(x2, Y)
model <- gbm(formula = Y~x1+x2+x3, distribution = "gaussian", n.trees = 10000, interaction.depth = 7)
xx <- seq(-1, 1, by = 0.01)
yy <- vapply(xx,
function(x) mean(predict(model, data.frame(x1=x1, x2=rep(x, n), x3=x3), n.trees=10000)),
numeric(1))
plot(xx, yy)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment