Simpson's Paradox - Random Notes Go Brrrrrrr

> [!tldr] Simpson's Paradox > **Simpson's paradox** refers to the paradox where the (linear) correlation between two variables disappear or reverse when other (potentially confounding) variables are introduced/averaged out. ### IRL Example: UCB Admission Rate by Sex Let the response $Y$ be admission probability, $S$ be the applicant's sex ($1$ if they are women), and $D$ be the department they apply to (with the value coded numerically by their "difficulty"). [This section of the wiki page on Simpson's paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox#UC_Berkeley_gender_bias) lists data showing that women have higher admission rates than men in applying to most departments at UCB ($\rho_{YS ~|~D} > 0$), but because they tend to apply to competitive departments ($\rho_{SD}>0$), overall, when averaging over $D$, women have a lower admission rate ($\rho_{YS}<0$). In this case $D$ is the **confounding variable** whose introduction/presence reverses the correlation between $Y$ and $S$. ### Theoretical Example: Linear Regression In this case, the color (clustering) confounds the correlation between $X,Y$. ![[SimpsonsParadoxOLS.gif#invert|center]] ### Theoretical Example: Linearly Correlated RVs Let $X,Y,W$ be normalized (i.e. have variance $1$) RVs, with covariance matrix $\mathrm{Cov}(Y,W,X)=\begin{pmatrix} 1 & \rho_{YW} & \rho_{YX} \\ \rho_{YW} & 1 & \rho_{WX} \\ \rho_{YX} & \rho_{WX} & 1 \end{pmatrix}.$ The residuals $E_{Y},E_{X}$ (i.e. with correlation with $W$ removed) are the linear combinations given by $(E_{Y},E_{X})^{T}=\underbrace{\begin{pmatrix} 1 & -\rho_{YW} &0 \\ 0 & -\rho_{XW} & 1 \end{pmatrix}}_{A}\begin{pmatrix} Y \\ W \\ X \end{pmatrix},$so they have covariance matrix $\begin{align*} A&\mathrm{Cov}(Y,W,X)A^{T} \\[0.4em] &= \begin{pmatrix} 1 & -\rho_{YW} &0 \\ 0 & -\rho_{XW} & 1 \end{pmatrix}\begin{pmatrix} 1 & \rho_{YW} & \rho_{YX} \\ \rho_{YW} & 1 & \rho_{WX} \\ \rho_{YX} & \rho_{WX} & 1 \end{pmatrix}\begin{pmatrix} 1 & -\rho_{YW} &0 \\ 0 & -\rho_{XW} & 1 \end{pmatrix}^{T}\\ &= \begin{pmatrix} \ast & \rho_{YX}-\rho_{YW}\rho_{WX} \\ \ast& \ast \end{pmatrix}, \end{align*} $where *we can treat $\rho_{YX}$ as the "overall" correlation, and $\rho_{YW}\rho_{WX}$ the confounding effect of $W$.* - This resembles the chain rule from calculus: $\underset{\rho_{YX}}{\frac{ d Y }{ d X }}=\underset{\rho_{YW}}{\frac{ \partial Y }{ \partial W }}\underset{\rho_{WX}}{\frac{ \partial W }{ \partial X }}+ \underset{\rho_{YX ~|~ W}}{\frac{ \partial Y }{ \partial X }}. $ - As a result, if we control for the effect of $W$ by orthogonalizing $Y,X$ wrt. it (cf. [[Orthogonal Projection, Confounding, and Missing Variable in OLS#OLS as Orthogonal Projection|OLS as orthogonal projection]]), *a strong confounding effect $\rho_{YW}\rho_{WX}$ can flip the sign of a weak $\rho _{YX ~|~ W}$.*