Created
July 20, 2015 01:40
-
-
Save FeiYeYe/46fcdd16981e2f341e41 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --- | |
| title: "You don't know partial dependence plot" | |
| author: "Fei Ye" | |
| date: "July, 2015" | |
| output: beamer_presentation | |
| --- | |
| Conversation starting like this usually doesn't end up quickly. | |
| Q: I made a partial dependence plot and it looks awful and unintuitive, what did I do wrong? | |
| A: As a (data) scientist, you probably need to define partial dependence plot first... | |
| A general function $f(X) = f(X_S, X_C)$, where $X$ is $p$-dimensional, $S$ is a set in $\{1, 2, ..., p\}$, $C$ is the complement set: | |
| $$ f_S(X_S) = E_{X_C}f(X_S, X_C) \approx \frac{1}{N}\sum_{i=1}^Nf(X_S, x_{iC})$$ | |
| It's important to note that partial dependence plot (PDP) defined represent the effect of $X_S$ on $f(X)$ after accounting for the average effects of the other variables $X_C$ on $f(X)$. They are not the effe | |
| ct of $X_S$ on $f(X)$ conditioned on the effects of $X_C$, which is given by the individual conditional expectation (ICE): | |
| $$ \bar{f}_S(X_S) = E(f(X_S,f_C)|X_S) $$ | |
| They are the same only if $X_S$ and $X_C$ are independent, for example, additive or multiplicative models: | |
| $$ f(X) = h_1(X_S) + h_2(X_C) $$ | |
| $$ f(X) = h_1(X_S) \cdot h_2(X_C) $$ | |
| What I did for xgboost importance plot was actually an approximation of ICE: | |
| https://github.com/FeiYeYe/xgboost/blob/master/R-package/R/plot.xgb.Booster.R#L30 | |
| Quick test: does $\beta$ in linear regression represents the effects before or after accounting for the effects of the other variables? | |
| In fact, the original work introducing PDP argues that the PDP can be a useful summary for the chosen subset of variables if their dependence on the remaining features is not too strong. When the dependence | |
| is strong, however that is, when interactions are present the PDP can be misleading. ICE plots are intended to address these issues. | |
| Sometime the 1-d plot does not reveal the truth. Simulate $Y = 0.2X_1-5X_2+10X_2I_{X_3\ge 0}+\epsilon$, where $\epsilon \sim N(0,1)$ and $X_1,X_2,X_3 \sim U(-1,1)$. | |
| ```{r} | |
| library(gbm) | |
| n <- 1000 | |
| eps <- rnorm(n) | |
| xs <- runif(3*n, min = -1, max = 1) | |
| x1 <- xs[1:n] | |
| x2 <- xs[(n+1):(2*n)] | |
| x3 <- xs[(2*n+1):(3*n)] | |
| Y <- 0.2 * x1 - 5 * x2 + 10 * x2 * ifelse(x3 >= 0, 1, 0) + eps | |
| plot(x2, Y) | |
| model <- gbm(formula = Y~x1+x2+x3, distribution = "gaussian", n.trees = 10000, interaction.depth = 7) | |
| xx <- seq(-1, 1, by = 0.01) | |
| yy <- vapply(xx, | |
| function(x) mean(predict(model, data.frame(x1=x1, x2=rep(x, n), x3=x3), n.trees=10000)), | |
| numeric(1)) | |
| plot(xx, yy) | |
| ``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment