Causal Graphs for Inference - Random Notes Go Brrrrrrr

> [!tldr] Causal Graphs > A **causal graph** $G=(V,E)$ is a [[Directed Acyclic Graph|directed acyclic graph (DAG)]] representing some causality model, where each node is a random variable, and $\overrightarrow{XY} \in E$ if any only if $X$ is a direct cause of $Y$. > > It can be used to fit a [[Directed Graphical Models|directed graphical model]]. > The graph can omit "error" terms, i.e. unobserved influences that can change variables even if its $\mathrm{pa}$ are all held constant. The rough goal is to study the causal effect of applying a treatment $T$ to a response $Y$. ## Interventions What exactly do we mean when we talk about applying a treatment? When we intervene on a treatment variable $T$ by make it $T=x_{t}$ in an experiment, it is not equivalent to mathematically conditioning on $\{ T = x_{t}\}$ on an observational dataset. - Here the variable is $T$, its node on the graph is $t$, and its values are denoted $x_{t}$; similarly for $Y,y,x_{y}$. Take the doctor example again, where we have (a simplified version) ```mermaid flowchart LR X[Severity X] --> T[Treatment T] X --> Y[Survival Y] T --> Y ``` When we intervene by applying a treatment, we break the causality $X \to T$ by artificially changing the value of $T$, regardless of $X$, giving the following graph ```mermaid flowchart LR X[Severity X] --> Y[Survival Y] T[[Treatment T]] --> Y ``` where the extra fancy box on $T$ indicates that it has been intervened on. > [!definition|*] Intervention > When we **intervene** on a variable $T$ to set it to $T=x_{t}$, we denote this "event" as $\mathrm{do}(T=x_{t}):=:\mathrm{do}(x_{t})$, in contrast with $\{ T=x_{t} \}$. > > On the graphical model, the intervention is reflected by disconnecting $T$ from all its parents on the causal graph. > > Probabilistically, the intervention alters the joint distribution to be > $p(x_{V - \{ t \}} ~|~ \mathrm{do}(x_{t}))=\frac{p(x_{V})}{p(x_{t} ~|~x_{\mathrm{pa}(t)})}=\prod_{v \in V - \{ t \}}p(x_{v} ~|~x_{\mathrm{pa}(v)}),$where we simply removed the dependence of $T$ on its parents. But to find the marginal of $Y$ naively is expensive: $p(y~|~\mathrm{do}(x_{t}))=\sum_{x_{V-\{ t,y \}}}p(x_{V-\{ t,y \}},x_{y}~|~\mathrm{do}(x_{t}))$involves summing over all variables other than $T,Y$. Substituting the definition above, with $W:=V-\{ y,t \}-\mathrm{pa}(t)$, $\begin{align*} p(x_{y} ~|~\mathrm{do}(x_{t})) &= \sum_{x_{V-\{ t,y \}}} \frac{p(x_{y},x_{t},x_{W},x_{\mathrm{pa}(t)})}{p(x_{t} ~|~ x_{\mathrm{pa}(t)})}\\ &= \sum_{x_{V-\{ t,y \}}} p(x_{\mathrm{pa}(t)})\frac{(\dots)}{p(x_{t},x_{\mathrm{pa}(t)})}\\[0.4em] &= \sum_{x_{V-\{ t,y \}}} p(x_{\mathrm{pa}(t)})\cdot p(x_{y},x_{W} ~|~ x_{t},x_{\mathrm{pa}(t)})\\ &= \sum_{x_{\mathrm{pa}(t)}}p(x_{\mathrm{pa}(t)})\cdot p(x_{y} ~|~x_{t},x_{\mathrm{pa}(t)}),\\ \end{align*}$ giving the so-called **adjustment formula**, and it's actually sufficient to sum over $\mathrm{pa}(t)$. This already makes things easier, but is it possible to find a small subset of variables $C \subset V$, ideally smaller than $\mathrm{pa}(t)$, so that it is a **valid adjustment set** satisfying $p(y~|~ \mathrm{do}(t))=\sum_{x_{C}}p(x_{y}~|~x_{C},x_{t})\cdot p(x_{C})?$ - Note that $C$ has nothing to do with cliques here, C stands for "control" in some sense. ### Causal Paths To derive a criterion of a valid $C$, we need to know which associations are causal (i.e. the ones we wish to study) and which should be severed. > [!definition|*] Causal Paths > To go from vertex $i$ to $j$ on a DAG/causal graph $G$, > - An $ij$ path is the same as paths on the non-directional graph. > - An $ij$ path is **directed** or **causal** if all its edges point away from $i$ and towards $j$. > - Otherwise the path is **non-causal**. > > All vertices on causal paths, except $T$ itself, are called **causal nodes**, denoted $\mathrm{cn}(T \to Y)$. To study the effect on some response $Y$ due to modifying some treatment $T$, we are interested in: - Keeping all causal $TY$ paths -- those are the good ones we want to study. - While **blocking** or **closing** non-causal ones that can muddy the waters. - We do so by conditioning/controlling for certain variables. > [!examples] > As an example: a doctor only administers a strong medicine to severe patients, so the causal graph is > ```mermaid > flowchart LR > X[Severity X] --> T[Treatment T] > X --> Y[Survival Y] > T --> Y > T --> W["Side Effect W"] > W --> Y > ``` > To study the effect of $T$ on $Y$: > - We want to keep the causal paths $T \to Y$ and $T \to W \to Y$. > - We want to block the non-causal path $T \leftarrow X \to Y$, because patients with the treatment are more likely to be severely sick, reducing their survival chance, but we are not interested in that effect. Therefore, > [!idea] We want to condition on a subset $C$ that "blocks" non-causal paths while "preserving" causal ones. This leaves two questions unanswered: - How to block a path, and how to not block it; - What exactly "blocking" means mathematically. The first question depends on the graph's structure, so we study those in the following section. The second is defined in the section after that. ## Path Structures and Blocking Since we assume $G$ is a DAG, the only possible structures on a node $X$ along the $TY$ path $\pi$ are: - Edges with different direction, pointing away ($\leftarrow X \to$); - Edges with different direction, pointing inwards ($\to X\leftarrow$); - Edges with same direction, pointing towards $Y$ ($\to X \to$). We go over each structure, discuss whether it should be blocked or not, and how to block it. ### Confounders (← X →) > [!definition|*] Confounders > A vertex $X$ is a **confounder** in the $TY$ path if it looks like > ```mermaid > flowchart LR > T ~~~ J((...)) ~~~ X --- J --> T > X ~~~ J > J ~~~ T > X --- I((...)) --> Y > ``` > In this case the path, sometimes called a **fork**, is non-causal, and can be blocked by conditioning on $X$. Take the doctor example again, where we have (a simplified version) ```mermaid flowchart LR X[Severity X] --> T[Treatment T] X --> Y[Survival Y] T --> Y ``` Here the non-causal path is $T \leftarrow X \to Y$. We can expect severity $X$ to have a negative direct influence on survival $Y$, but a positive indirect influence via treatment $T$, and $X$ is a confounder. ### Mediators (→ X →) > [!definition|*] Confounders > A vertex $X$ is a **mediator** in a causal $TY$ path if it looks like > ```mermaid > flowchart LR > T --> J((...)) --> X > X --- I((...)) --> Y > ``` > We do not want to condition on $X$ since it blocks the causal path, which we are interested in. For example, consider a medicine that does not target the infection but strengthens the immune system of a patient: ```mermaid flowchart LR T[Treatment T] T --> X["Immune System X"] X --> Y[Survival Y] ``` Then $X$ is a **mediator**, and we should not control for it, because here it is part of a causal path, and doing so invalidates the (beneficial) effect of $T$. On the other hand, mediators can also appear in non-causal paths, in which they may need to be controlled for: ```mermaid flowchart LR X[Wealth X] --> T[Treatment T] X --> H[Hospital Quality H] --> Y[Survival Y] T --> Y ``` Here $H$ is a mediator on the path $T \leftarrow X \to H \to Y$, so we can block this path by controlling it (or $X$). ### Colliders (→ X ←) Following the definition in DAGs, we have ![[Directed Acyclic Graph#^d17b82]] For example, ```mermaid graph LR W[Studying W] --> Y[Good Grades Y] X[Cheating X] --> Y ~~~ X X ~~~ Y ``` In this case, it may be the case that $W\perp X$, but then $W \not \perp X ~|~ Y$: if someone got good grades, knowing that they didn't study makes cheating much more likely. Therefore, *controlling for colliders opens up the (non-causal) path.* In addition, *colliders can be controlled by controlling their descendants* as well: say in the example above we also have an "outcome" of getting good grades, ```mermaid graph TD W[Studying W] --> Y[Good Grades Y] X[Cheating X] --> Y --> S[Scholarship S] ``` Then controlling for $S$ also controls for $Y$. ## Conditioning and d-Separation > [!definition|*] d-Open/Closed Paths, d-Separation > A path $\pi$ is **open** (or **d-open**) conditioning on some $C \subset V$ if > - All colliders are in $\mathrm{an}(C)$, so they are controlled. > - All non-colliders are not in $C$, so they are not controlled. > > Two sets $A,B \subset V$ are **d-separated** by $C$, denoted $A \perp_{d} B ~|~ C$, if all paths (directed or not) between $A,B$ are closed conditioning on $C$. In the context of causal/non-causal paths, - We want causal paths to remain d-open by not controlling them in any way, - and make non-causal paths to be d-close by either leaving a collider in, or by controlling one of the non-colliders. ### Generalized Adjustment Criterion To summarize the structures we need to deal with, - The confounders are the baddies, because they are make non-causal paths. - The mediators are just part of whatever path they are in, they can be good in causal paths and bad in non-causal ones. - The colliders are the goodies since they sabotage non-causal paths. Therefore, - We have a set of vertices that we cannot condition on, i.e. $T,Y$, mediators, or their descendants. - And also need to block non-causal paths by conditioning on other variables. > [!definition|*] Forbidden Nodes, GAC > The set of **forbidden nodes** are just the causal nodes (including $Y$), their descendants, and $T$ itself: $\mathrm{forb}(T \to Y):= \{ T \} \cup \mathrm{de}(\mathrm{cn}(T\to Y)).$ > > A "good" controlled subset $C \subset V$ of variables satisfies the **generalized adjustment criterion (GAC)** if: > - It contains no forbidden nodes. > - All non-causal paths $T \to Y$ are blocked by $C$. - In particular *the parents $\mathrm{pa}(t)$ satisfy the GAC* since (1) acyclicity forces $T$ to have disjoint parents and descendants, and $\mathrm{forb}(T \to Y) \subset \mathrm{de}(t)$, and (2) all non-causal paths must have a collider or starts with $t \leftarrow v$ for some $v$, so $\mathrm{pa}(t)$ leaves the former alone, while controls the latter. As an example, consider the graph ![[CausalGraphExample.png#invert|center|w50]] Here the forbidden nodes are $\mathrm{forb}(T \to Y)=\{T,X,Y,R,S\}$, where $R,S$ are the descendants of the causal nodes $X,Y$. If $C$ satisfies the GAC, then so does $B:= C\cap \mathrm{nd}(t)$, so any minimal adjustment set must be **back-door adjustment set**, i.e. an adjustment set that is a subset of $\mathrm{nd}(t)$. > [!theorem|*] GAC is Sufficient for Valid Adjustment > If $C$ satisfies the GAC, then it is a valid adjustment set, i.e. $p(x_{y}~|~\mathrm{do}(x_{t}))=\sum_{x_{C}}p(x_{y}~|~x_{t},x_{C})\cdot p(x_{C}).$ ^79040c