Causal Inference for Binary Outcomes - Random Notes Go Brrrrrrr

It is common to model a binary outcome $Y \in \{ 0,1 \}$ with a threshold on a **latent variable** $Z$ like $Y = \mathbf{1}_{Z>z_{0}}$, with the causal graph ```mermaid flowchart LR X[covariates X] --> Z[latent Z] T[treatment T] --> Z Z -->Y[response Y] ``` As a result, the causal effect of $T$ is passed through a non-linear step function, causing [[Effect Heterogeneity|effect heterogeneity]]. ### Gaussian Example For demonstration, suppose $X,T$ have additive effects on $Z$, with linear model $Z=\beta_{X} X+\beta_{T}T+\epsilon,$so applying the treatment changes the expectation of $Z$ by $\beta_{T}$. However, the response $Y$ can: - Change (on average), because $Z$ now is more likely to cross the threshold, - Or doesn't change (with a high probability), simply because $\beta_{X}X$ is too far away from the threshold. In the simplest case, let $\epsilon \sim N(0,0.1^{2})$ be independent of $T$, and $\beta_{X}=\beta_{T}=1$, $z_{0}=0$, so $\mathbb{P}[Y=1~|~T=1]=\mathbb{P}[Z>0~|~T=1]=\mathbb{P}[X+\epsilon>-1],$and for a fixed $X=x$, this probability is $\Phi(10(x+1))$. Therefore as a function of $x$, the CATE is plotted as : ![[BinaryOutcomeCATE.png]] Therefore, *the CATE is the highest when the other covariates put $\mathbb{E}[Z]$ close to the threshold.*