FeiYeYe · July 20, 2015 01:40
diff --git a/partial_dependence_plot.Rmd b/partial_dependence_plot.Rmd
 ---
 title: "You don't know partial dependence plot"
 author: "Fei Ye"
 date: "July, 2015"
 output: beamer_presentation
 ---

 Conversation starting like this usually doesn't end up quickly.

 Q: I made a partial dependence plot and it looks awful and unintuitive, what did I do wrong?

 A: As a (data) scientist, you probably need to define partial dependence plot first...

 A general function $f(X) = f(X_S, X_C)$, where $X$ is $p$-dimensional, $S$ is a set in $\{1, 2, ..., p\}$, $C$ is the complement set:
 $$ f_S(X_S) = E_{X_C}f(X_S, X_C) \approx \frac{1}{N}\sum_{i=1}^Nf(X_S, x_{iC})$$

 It's important to note that partial dependence plot (PDP) defined represent the effect of $X_S$ on $f(X)$ after accounting for the average effects of the other variables $X_C$ on $f(X)$. They are not the effe
 ct of $X_S$ on $f(X)$ conditioned on the effects of $X_C$, which is given by the individual conditional expectation (ICE):
 $$ \bar{f}_S(X_S) = E(f(X_S,f_C)|X_S) $$

 They are the same only if $X_S$ and $X_C$ are independent, for example, additive or multiplicative models:
 $$ f(X) = h_1(X_S) + h_2(X_C) $$
 $$ f(X) = h_1(X_S) \cdot h_2(X_C) $$

 What I did for xgboost importance plot was actually an approximation of ICE:
 https://github.com/FeiYeYe/xgboost/blob/master/R-package/R/plot.xgb.Booster.R#L30

 Quick test: does $\beta$ in linear regression represents the effects before or after accounting for the effects of the other variables?

 In fact, the original work introducing PDP argues that the PDP can be a useful summary for the chosen subset of variables if their dependence on the remaining features is not too strong.  When the dependence 
 is strong, however that is, when interactions are present the PDP can be misleading. ICE plots are intended to address these issues.

 Sometime the 1-d plot does not reveal the truth. Simulate $Y = 0.2X_1-5X_2+10X_2I_{X_3\ge 0}+\epsilon$, where $\epsilon \sim N(0,1)$ and $X_1,X_2,X_3 \sim U(-1,1)$.

 ```{r}
 library(gbm)
 n <- 1000
 eps <- rnorm(n)
 xs <- runif(3*n, min = -1, max = 1)
 x1 <- xs[1:n]
 x2 <- xs[(n+1):(2*n)]
 x3 <- xs[(2*n+1):(3*n)]
 Y <- 0.2 * x1 - 5 * x2 + 10 * x2 * ifelse(x3 >= 0, 1, 0) + eps
 plot(x2, Y)
 model <- gbm(formula = Y~x1+x2+x3, distribution = "gaussian", n.trees = 10000, interaction.depth = 7)
 xx <- seq(-1, 1, by = 0.01)
 yy <- vapply(xx,
  function(x) mean(predict(model, data.frame(x1=x1, x2=rep(x, n), x3=x3), n.trees=10000)),
  numeric(1))
 plot(xx, yy)

 ```
	---
	title: "You don't know partial dependence plot"
	author: "Fei Ye"
	date: "July, 2015"
	output: beamer_presentation
	---

	Conversation starting like this usually doesn't end up quickly.

	Q: I made a partial dependence plot and it looks awful and unintuitive, what did I do wrong?

	A: As a (data) scientist, you probably need to define partial dependence plot first...

	A general function $f(X) = f(X_S, X_C)$, where $X$ is $p$-dimensional, $S$ is a set in $\{1, 2, ..., p\}$, $C$ is the complement set:
	$$ f_S(X_S) = E_{X_C}f(X_S, X_C) \approx \frac{1}{N}\sum_{i=1}^Nf(X_S, x_{iC})$$

	It's important to note that partial dependence plot (PDP) defined represent the effect of $X_S$ on $f(X)$ after accounting for the average effects of the other variables $X_C$ on $f(X)$. They are not the effe
	ct of $X_S$ on $f(X)$ conditioned on the effects of $X_C$, which is given by the individual conditional expectation (ICE):
	$$ \bar{f}_S(X_S) = E(f(X_S,f_C)\|X_S) $$

	They are the same only if $X_S$ and $X_C$ are independent, for example, additive or multiplicative models:
	$$ f(X) = h_1(X_S) + h_2(X_C) $$
	$$ f(X) = h_1(X_S) \cdot h_2(X_C) $$

	What I did for xgboost importance plot was actually an approximation of ICE:
	https://github.com/FeiYeYe/xgboost/blob/master/R-package/R/plot.xgb.Booster.R#L30

	Quick test: does $\beta$ in linear regression represents the effects before or after accounting for the effects of the other variables?

	In fact, the original work introducing PDP argues that the PDP can be a useful summary for the chosen subset of variables if their dependence on the remaining features is not too strong. When the dependence
	is strong, however that is, when interactions are present the PDP can be misleading. ICE plots are intended to address these issues.

	Sometime the 1-d plot does not reveal the truth. Simulate $Y = 0.2X_1-5X_2+10X_2I_{X_3\ge 0}+\epsilon$, where $\epsilon \sim N(0,1)$ and $X_1,X_2,X_3 \sim U(-1,1)$.

	```{r}
	library(gbm)
	n <- 1000
	eps <- rnorm(n)
	xs <- runif(3*n, min = -1, max = 1)
	x1 <- xs[1:n]
	x2 <- xs[(n+1):(2*n)]
	x3 <- xs[(2n+1):(3n)]
	Y <- 0.2 * x1 - 5 * x2 + 10 * x2 * ifelse(x3 >= 0, 1, 0) + eps
	plot(x2, Y)
	model <- gbm(formula = Y~x1+x2+x3, distribution = "gaussian", n.trees = 10000, interaction.depth = 7)
	xx <- seq(-1, 1, by = 0.01)
	yy <- vapply(xx,
	function(x) mean(predict(model, data.frame(x1=x1, x2=rep(x, n), x3=x3), n.trees=10000)),
	numeric(1))
	plot(xx, yy)

	```
No results found