Partial Dependence - Random Notes Go Brrrrrrr

> [!tldr] > **Partial dependence** provides a way to interpret/explain blackbox models like [[Bootstrap Ensemble Methods#Random Forests|random forest]] and [[Boosting|gradient boosting]]. > > For a multivariate function $f:\mathbb{R}^{p} \mapsto \mathbb{R}$ (i.e. a predictive model) that operates on random variables $X=(X_{1},\dots, X_{p})$, we can remove its dependence on a subset of variables $X_{\mathcal{C}}\subseteq X$ by averaging over them, giving its partial dependence on $X_{\mathcal{S}}:= X \backslash X_{\mathcal{C}}$: $f_{\mathcal{S}}(\mathbf{X}_{\mathcal{S}})=\mathbb{E}_{\mathbf{X}_{\mathcal{C}}}[f(\mathbf{X})]$where $\mathbf{X}_{\mathcal{S}}$ is treated as inputs, while $\mathbf{X}_{\mathcal{C}}$ is being averaged over. Taking partial dependence reduces the number of inputs (especially when $|\mathcal{S}| \le 2$), making it easier to visualize with **partial dependence plots**. > [!warning] Wrong Interpretations > Partial dependence *averages* over the effects of other predictors $\mathbf{X}_{-\mathcal{S}}$ by taking the expectation. > > It does not *ignore* their effects, which would be taking the conditional expectation on $\mathbf{X}_{\mathcal{S}}$: $\text{ignored}\iff \mathbb{E}[f(\mathbf{X})\,|\,\mathbf{X}_{\mathcal{S}}].$The two are equal only when $\mathbf{X}_{\mathcal{S}}$ and $\mathbf{X}_{-\mathcal{S}}$ are independent. Therefore, in [[Generalized Additive Models|GAMs]], the partial dependence of a feature $X_{j}$ will be different from the smoothing spline fitted for it.