> [!tldr] Causal Graphs
> A **causal graph** $G=(V,E)$ is a [[Directed Acyclic Graph|directed acyclic graph (DAG)]] representing some causality model, where each node is a random variable, and $\overrightarrow{XY} \in E$ if any only if $X$ is a direct cause of $Y$.
>
> It can be used to fit a [[Directed Graphical Models|directed graphical model]].
>
The graph can omit "error" terms, i.e. unobserved influences that can change variables even if its $\mathrm{pa}$ are all held constant.
The rough goal is to study the causal effect of applying a treatment $T$ to a response $Y$.
## Interventions
What exactly do we mean when we talk about applying a treatment? When we intervene on a treatment variable $T$ by make it $T=x_{t}$ in an experiment, it is not equivalent to mathematically conditioning on $\{ T = x_{t}\}$ on an observational dataset.
- Here the variable is $T$, its node on the graph is $t$, and its values are denoted $x_{t}$; similarly for $Y,y,x_{y}$.
Take the doctor example again, where we have (a simplified version)
```mermaid
flowchart LR
X[Severity X] --> T[Treatment T]
X --> Y[Survival Y]
T --> Y
```
When we intervene by applying a treatment, we break the causality $X \to T$ by artificially changing the value of $T$, regardless of $X$, giving the following graph
```mermaid
flowchart LR
X[Severity X] --> Y[Survival Y]
T[[Treatment T]] --> Y
```
where the extra fancy box on $T$ indicates that it has been intervened on.
> [!definition|*] Intervention
> When we **intervene** on a variable $T$ to set it to $T=x_{t}$, we denote this "event" as $\mathrm{do}(T=x_{t}):=:\mathrm{do}(x_{t})$, in contrast with $\{ T=x_{t} \}$.
>
> On the graphical model, the intervention is reflected by disconnecting $T$ from all its parents on the causal graph.
>
> Probabilistically, the intervention alters the joint distribution to be
> $p(x_{V - \{ t \}} ~|~ \mathrm{do}(x_{t}))=\frac{p(x_{V})}{p(x_{t} ~|~x_{\mathrm{pa}(t)})}=\prod_{v \in V - \{ t \}}p(x_{v} ~|~x_{\mathrm{pa}(v)}),$where we simply removed the dependence of $T$ on its parents.
But to find the marginal of $Y$ naively is expensive: $p(y~|~\mathrm{do}(x_{t}))=\sum_{x_{V-\{ t,y \}}}p(x_{V-\{ t,y \}},x_{y}~|~\mathrm{do}(x_{t}))$involves summing over all variables other than $T,Y$.
Substituting the definition above, with $W:=V-\{ y,t \}-\mathrm{pa}(t)$,
$\begin{align*} p(x_{y} ~|~\mathrm{do}(x_{t})) &= \sum_{x_{V-\{ t,y \}}} \frac{p(x_{y},x_{t},x_{W},x_{\mathrm{pa}(t)})}{p(x_{t} ~|~ x_{\mathrm{pa}(t)})}\\
&= \sum_{x_{V-\{ t,y \}}} p(x_{\mathrm{pa}(t)})\frac{(\dots)}{p(x_{t},x_{\mathrm{pa}(t)})}\\[0.4em]
&= \sum_{x_{V-\{ t,y \}}} p(x_{\mathrm{pa}(t)})\cdot p(x_{y},x_{W} ~|~ x_{t},x_{\mathrm{pa}(t)})\\
&= \sum_{x_{\mathrm{pa}(t)}}p(x_{\mathrm{pa}(t)})\cdot p(x_{y} ~|~x_{t},x_{\mathrm{pa}(t)}),\\
\end{align*}$
giving the so-called **adjustment formula**, and it's actually sufficient to sum over $\mathrm{pa}(t)$.
This already makes things easier, but is it possible to find a small subset of variables $C \subset V$, ideally smaller than $\mathrm{pa}(t)$, so that it is a **valid adjustment set** satisfying
$p(y~|~ \mathrm{do}(t))=\sum_{x_{C}}p(x_{y}~|~x_{C},x_{t})\cdot p(x_{C})?$
- Note that $C$ has nothing to do with cliques here, C stands for "control" in some sense.
### Causal Paths
To derive a criterion of a valid $C$, we need to know which associations are causal (i.e. the ones we wish to study) and which should be severed.
> [!definition|*] Causal Paths
> To go from vertex $i$ to $j$ on a DAG/causal graph $G$,
> - An $ij$ path is the same as paths on the non-directional graph.
> - An $ij$ path is **directed** or **causal** if all its edges point away from $i$ and towards $j$.
> - Otherwise the path is **non-causal**.
>
> All vertices on causal paths, except $T$ itself, are called **causal nodes**, denoted $\mathrm{cn}(T \to Y)$.
To study the effect on some response $Y$ due to modifying some treatment $T$, we are interested in:
- Keeping all causal $TY$ paths -- those are the good ones we want to study.
- While **blocking** or **closing** non-causal ones that can muddy the waters.
- We do so by conditioning/controlling for certain variables.
> [!examples]
> As an example: a doctor only administers a strong medicine to severe patients, so the causal graph is
> ```mermaid
> flowchart LR
> X[Severity X] --> T[Treatment T]
> X --> Y[Survival Y]
> T --> Y
> T --> W["Side Effect W"]
> W --> Y
> ```
> To study the effect of $T$ on $Y$:
> - We want to keep the causal paths $T \to Y$ and $T \to W \to Y$.
> - We want to block the non-causal path $T \leftarrow X \to Y$, because patients with the treatment are more likely to be severely sick, reducing their survival chance, but we are not interested in that effect.
Therefore,
> [!idea] We want to condition on a subset $C$ that "blocks" non-causal paths while "preserving" causal ones.
This leaves two questions unanswered:
- How to block a path, and how to not block it;
- What exactly "blocking" means mathematically.
The first question depends on the graph's structure, so we study those in the following section. The second is defined in the section after that.
## Path Structures and Blocking
Since we assume $G$ is a DAG, the only possible structures on a node $X$ along the $TY$ path $\pi$ are:
- Edges with different direction, pointing away ($\leftarrow X \to$);
- Edges with different direction, pointing inwards ($\to X\leftarrow$);
- Edges with same direction, pointing towards $Y$ ($\to X \to$).
We go over each structure, discuss whether it should be blocked or not, and how to block it.
### Confounders (← X →)
> [!definition|*] Confounders
> A vertex $X$ is a **confounder** in the $TY$ path if it looks like
> ```mermaid
> flowchart LR
> T ~~~ J((...)) ~~~ X --- J --> T
> X ~~~ J
> J ~~~ T
> X --- I((...)) --> Y
> ```
> In this case the path, sometimes called a **fork**, is non-causal, and can be blocked by conditioning on $X$.
Take the doctor example again, where we have (a simplified version)
```mermaid
flowchart LR
X[Severity X] --> T[Treatment T]
X --> Y[Survival Y]
T --> Y
```
Here the non-causal path is $T \leftarrow X \to Y$. We can expect severity $X$ to have a negative direct influence on survival $Y$, but a positive indirect influence via treatment $T$, and $X$ is a confounder.
### Mediators (→ X →)
> [!definition|*] Confounders
> A vertex $X$ is a **mediator** in a causal $TY$ path if it looks like
> ```mermaid
> flowchart LR
> T --> J((...)) --> X
> X --- I((...)) --> Y
> ```
> We do not want to condition on $X$ since it blocks the causal path, which we are interested in.
For example, consider a medicine that does not target the infection but strengthens the immune system of a patient:
```mermaid
flowchart LR
T[Treatment T]
T --> X["Immune System X"]
X --> Y[Survival Y]
```
Then $X$ is a **mediator**, and we should not control for it, because here it is part of a causal path, and doing so invalidates the (beneficial) effect of $T$.
On the other hand, mediators can also appear in non-causal paths, in which they may need to be controlled for:
```mermaid
flowchart LR
X[Wealth X] --> T[Treatment T]
X --> H[Hospital Quality H] --> Y[Survival Y]
T --> Y
```
Here $H$ is a mediator on the path $T \leftarrow X \to H \to Y$, so we can block this path by controlling it (or $X$).
### Colliders (→ X ←)
Following the definition in DAGs, we have ![[Directed Acyclic Graph#^d17b82]]
For example,
```mermaid
graph LR
W[Studying W] --> Y[Good Grades Y]
X[Cheating X] --> Y ~~~ X
X ~~~ Y
```
In this case, it may be the case that $W\perp X$, but then $W \not \perp X ~|~ Y$: if someone got good grades, knowing that they didn't study makes cheating much more likely.
Therefore, *controlling for colliders opens up the (non-causal) path.*
In addition, *colliders can be controlled by controlling their descendants* as well: say in the example above we also have an "outcome" of getting good grades,
```mermaid
graph TD
W[Studying W] --> Y[Good Grades Y]
X[Cheating X] --> Y --> S[Scholarship S]
```
Then controlling for $S$ also controls for $Y$.
## Conditioning and d-Separation
> [!definition|*] d-Open/Closed Paths, d-Separation
> A path $\pi$ is **open** (or **d-open**) conditioning on some $C \subset V$ if
> - All colliders are in $\mathrm{an}(C)$, so they are controlled.
> - All non-colliders are not in $C$, so they are not controlled.
>
> Two sets $A,B \subset V$ are **d-separated** by $C$, denoted $A \perp_{d} B ~|~ C$, if all paths (directed or not) between $A,B$ are closed conditioning on $C$.
In the context of causal/non-causal paths,
- We want causal paths to remain d-open by not controlling them in any way,
- and make non-causal paths to be d-close by either leaving a collider in, or by controlling one of the non-colliders.
### Generalized Adjustment Criterion
To summarize the structures we need to deal with,
- The confounders are the baddies, because they are make non-causal paths.
- The mediators are just part of whatever path they are in, they can be good in causal paths and bad in non-causal ones.
- The colliders are the goodies since they sabotage non-causal paths.
Therefore,
- We have a set of vertices that we cannot condition on, i.e. $T,Y$, mediators, or their descendants.
- And also need to block non-causal paths by conditioning on other variables.
> [!definition|*] Forbidden Nodes, GAC
> The set of **forbidden nodes** are just the causal nodes (including $Y$), their descendants, and $T$ itself: $\mathrm{forb}(T \to Y):= \{ T \} \cup \mathrm{de}(\mathrm{cn}(T\to Y)).$
>
> A "good" controlled subset $C \subset V$ of variables satisfies the **generalized adjustment criterion (GAC)** if:
> - It contains no forbidden nodes.
> - All non-causal paths $T \to Y$ are blocked by $C$.
- In particular *the parents $\mathrm{pa}(t)$ satisfy the GAC* since (1) acyclicity forces $T$ to have disjoint parents and descendants, and $\mathrm{forb}(T \to Y) \subset \mathrm{de}(t)$, and (2) all non-causal paths must have a collider or starts with $t \leftarrow v$ for some $v$, so $\mathrm{pa}(t)$ leaves the former alone, while controls the latter.
As an example, consider the graph
![[CausalGraphExample.png#invert|center|w50]]
Here the forbidden nodes are $\mathrm{forb}(T \to Y)=\{T,X,Y,R,S\}$, where $R,S$ are the descendants of the causal nodes $X,Y$.
If $C$ satisfies the GAC, then so does $B:= C\cap \mathrm{nd}(t)$, so any minimal adjustment set must be **back-door adjustment set**, i.e. an adjustment set that is a subset of $\mathrm{nd}(t)$.
> [!theorem|*] GAC is Sufficient for Valid Adjustment
> If $C$ satisfies the GAC, then it is a valid adjustment set, i.e. $p(x_{y}~|~\mathrm{do}(x_{t}))=\sum_{x_{C}}p(x_{y}~|~x_{t},x_{C})\cdot p(x_{C}).$
^79040c