We can state independence properties of a RV $X \sim p$ using a [[Graph Theory|graph]]. Let $A\perp B ~|~ S$ denote separation between vertex sets $A,B,S$ on a graph, and $X \perp Y ~|~ Z$ denote the usual conditional independences between random variables $X,Y$ given $Z$.
> [!definition|*] Pairwise, Local, and Global Markov Properties
> A distribution $X\sim p$ is **pairwise Markov** for a graph $G=(V,E)$ if
> $\forall i,j \in V, i \not \sim j \Longrightarrow X_{i} \perp X_{j} ~|~ X_{V-\{ i,j \}}.$
> That is, a missing edge $ij$ indicates that $X_{i} \perp X_{j}$, conditioning on all else.
>
> $p$ is **local Markov** if
> $\forall i, j \in V, i \not \sim j \Longrightarrow X_{i} \perp X_{V - \mathrm{bd}(i)} ~|~ X_{\mathrm{bd}(i)},$
> where $\mathrm{bd}(i)$ is the "boundary", i.e. the set of neighbors, of $i$ in $\mathcal{G}$. That is, $X_{i}$ is independent of all else, conditioning on its neighbours.
>
> Lastly, $p$ is **global Markov** if
> $\forall A,B, S \subset V, A \perp B ~|~ S ~\Longrightarrow~X_{A} \perp X_{B} ~|~ X_{S}$
> That is, graphical separation by $S$ is equivalent to independence conditioning on $S$.
Obviously global Markov implies local and pairwise Markov.
> [!theorem|*] Equivalence of Markov Properties (for strictly positive distributions)
> If $p$ is strictly positive, then local Markov property implies the global Markov property, and so the Markov properties are all equivalent.
>
> > [!proof]-
> > Consider sets $A,B,S$ such that $A \perp B ~|~ S$; let $\tilde{A}$ be the connected components $A$ is in in $\mathcal{G}_{V-S}$, and $\tilde{B}:= V-\tilde{A}-S$.
> >
> > Now induct on the number of vertices in $\tilde{B}$, we remove some $b\in \tilde{B}$ to get the global Markov relationship $\tilde{A} \perp \tilde{B}-\{ b \} ~|~ S \Rightarrow X_{A} \perp X_{\tilde{B}-\{ b \}} ~|~ X_{S}$.
> >
> > For each $a,a' \in \tilde{A}$, local Markov gives
> >
> > $\begin{align*}
> X_{a}\perp X_{b} ~|~ X_{V-\{ a,a',b \}}, X_{a'}\\
> X_{a'}\perp X_{b} ~|~ X_{V-\{ a,a',b \}}, X_{a}
> \end{align*}$
> >
> > Since $p$ is strictly positive, we can use the intersection graphoid axiom to get $X_{b} \perp X_{a, a'} ~|~ X_{V-\{ a,a',b \}}.$
> >
> > Proceeding inductively over $\tilde{A}$, we can "move" all $a''\in \tilde{A}$ from the conditioning side to the left, giving us $X_{b}\perp X_{\tilde{A}} ~|~ X_{V-\tilde{A}-\{ b \}} ~~~\Big(\!\!\!\iff X_{b }\perp X_{\tilde{A}} ~|~ X_{S}, X_{\tilde{B}-\{ b \}}\Big)$
> > (This is why we used $\tilde{A},\tilde{B}$ instead of $A,B$, as other wise we cannot separate $X_{S}$ in the parentheses above).
> >
> > Now using the weak union graphoid axiom, along with the inductive assumption $X_{\tilde{B}-\{ b \}}\perp X_{\tilde{A}} ~|~ X_{S}$ gives $X_{\tilde{B}} \perp X_{\tilde{A}} ~|~ X_{S}.$
> > Lastly, using the decomposition axiom, we can subset the two variables to get $X_{B}\perp X_{A} ~|~ X_{S}.$
>
## Decomposition of Graphs
> [!definition|*] Triangulated/Chordal Graphs
> A graph $G = (V,E)$ is **triangulated** or **chordal** if any cycle $C$ of length $\ge 4$ contains at least a chord $ij \in E$ where $ij \not \in C$.
^c929b1
> [!theorem|*] Equivalences with Decomposability
> For a graph $G$, all of the following are equivalent:
> - $G$ is decomposable.
> - $G$ is triangulated/chordal.
> - For $a,b \in V$, any minimal separator $S$ is complete.
> - For any clique $C_{1}\in\mathcal{C}(G)$, we can construct a running intersection sequence of cliques $(C_{1},\dots,C_{K})$ that starts with the chosen $C_{1}$.
> [!proof]- Proof that $(i) \Rightarrow (ii)$
> If $G$ is complete then it is certainly chordal. Otherwise, induct on its size; it has decomposition $(A,S,B)$ with $G_{A \cup S},G_{B \cup S}$ decomposable (hence chordal by induction) and $S$ complete. Any $k$-cycle on $G$ ($k\ge 4$) must either be a subset of $A\cup S,B\cup S$ (hence has a chord) or contain elements from both $A,B$. Since $S$ separates $A,B$, the cycle must contain two distinct elements from $S$. They are neighbours hence provide a chord in the cycle.
> [!proof]- Proof that $(ii)\Rightarrow (iii)$
> Assume for contradiction that $a,b$ has an incomplete minimal separator $S$ with $s_{1},s_{2}\in S, s_{1} \not \sim s_{2}$. Then by minimality, there are $ab$ paths $\pi_{1},\pi_{2}$ with $\pi_{i} \cap S=\{ s_{i} \}$ for $i=1,2$ (otherwise, removing the corresponding $s_{i}$ still separates $a,b$, contradictory to the minimality of $S$).
>
> Now consider the connected components $A,B$ of $a,b$ respectively in $G_{V-S}$. Then $s_{i}$ has neighbours $l_{i}\in A,r_{i}\in B$ in $\pi_{i}$. Now let $\pi_{l},\pi_{r}$ be the shortest paths joining $s_{1}s_{2}$ in $A,B$ respectively.
> ```mermaid
> flowchart LR
> subgraph A
> l1 ---|πl| l2
> end
>
> subgraph B
> r1 ---|πr| r2
> end
> s1 --- l1
> s1 --- r1
> l2 --- s2
> r2 --- s2
> ```
>
> We claim that $s_{1}\sim l_{1}\xrightarrow{\pi_{l}}l_{2}\sim s_{2}\sim r_{2}\xrightarrow{\pi_{r}}r_{1}\sim s_{1}$
>
> is a cycle of length at least $4$ with no chords. It a cycle since it contains no duplicate elements, and it has at least 4 elements since $s_{1},s_{2},l_{1},r_{1}$ are certainly distinct, even if $l_{1}=l_{2}$ and $r_{1}=r_{2}$. It contains no chords: (1) intermediate vertices of $\pi_{l},\pi_{r}$ are not neighbours due to being in different connected components, (2) between $s_{i}$ and those intermediate vertices there are no chords due to minimality of $\pi_{l},\pi_{r}$, and (3) $s_{1} \not \sim s_{2}$.
>
> This contradicts chordality.
> [!proof]- Proof that $(iii)\Rightarrow (iv)$
> If $G$ is complete, we are done; otherwise, choose $a\not \sim b$ with minimal (hence complete) separator $S$, and let $A$ be the connected component of $a$ in $G_{V-S}$, and $B := V-A$. Since $C_{1}$ is a clique, it is either $\subset A \cup S$ or $B \cup S$ due to disconnectedness. We prove the result for $C_{1}\subset A\cup S$, but the proof is identical for $C \subset B \cup S$.
>
> Inducting on the number of cliques, we have a RIP $(C_{1},\dots,C_{m}), (S_{1},\dots,S_{m})$ of $G_{A \cup S}$ that starts with $C_{1}$, and some RIP $(C_{m+1},\dots,C_{m+n}), (S_{m+1},\dots,S_{m+n})$ of $G_{B \cup S}$ that starts with a clique containing $S$, say $\tilde{S}=: C_{m+1}$.
>
> We claim that concatenating the two sequences gives an RIP of $G$, i.e. that $(C_{1},\dots,C_{m},\dots,C_{m+n}), (S_{1},\dots,S_{m},\dots,S_{m+n})$
> is an RIP. Again, any clique of $G$ is contained in either $A\cup S$ or $B \cup S$, so this sequences contains all cliques. The RIP is inherited for any $j \le m$, and for $j > m$, $\begin{align*}
C_{j} \cap \Big( \bigcup_{i<j} C_{i} \Big) &= \Big(C_{j} \cap \bigcup_{i=1}^{m} C_{i} \Big)\cup \Big(C_{j} \cap \bigcup _{i=m+1}^{j-1}C_{i}\Big) \\
&\subset( C_{j} \cap S) \cup \Big(C_{j} \cap \bigcup _{i=m+1}^{j-1}C_{i}\Big) &&[C_{j}\subset B\cup S, ~ \cup_{i=1}^{m} C_{i} \subset A \cup S]\\ &\subseteq (C_{j} \cap C_{m+1}) \cup \Big(C_{j} \cap \bigcup _{i=m+1}^{j-1}C_{i}\Big) &&[S \subseteq \tilde{S}=: C_{m+1}]\\
&= C_{j} \cap C_{\sigma(j)} &&[\text{RIP in }G_{B\cup S}].
\end{align*}$
>
> [!proof] Proof that $(iv)\Rightarrow (i)$
> Given cliques $(C_{1},\dots)$ and separators $(S_{1},\dots)$ that satisfy running intersection, we can decompose $G$ by taking
$A_{k}=C_{k}-S_k,~ S_{k}=S_{k},~ B_{k}=\{\cup_{k' < k}C_{k}\}-S_{k},~~k=K, \dots , 1.$
^582a3a
## Factorisation
Let $\mathcal{C}(G)$ be the set of **cliques** in $G$, where a clique is a maximal complete subgraph.
> [!definition|*] Factorisation
> $p$ **factorises** with respect to $G$ if we can find functions $\psi_{C}$ for each clique $C \in \mathcal{C}(G)$ such that $p(x) = \prod_{C \in \mathcal{C}(G)}\psi_{C}(x_{C}).$
^c6aa89
> [!theorem|*] Factorization implies Global Markov
> Factorization implies global Markov property: if $p$ factorizes wrt. $G$, then it is globally Markov wrt. $G$.
>
> > [!proof]-
> > Let $\tilde{A}$ be the vertex set of the connected component of $A$ in $G-S$, and define $\tilde{B}:= V-\tilde{A}-S$. Then any clique $C$ is a subset of (at least) one of $\tilde{A}\cup S$ and $\tilde{B}\cup S$. So the product $\prod_{C\in\mathcal{C}(G)}$ can be separated into $\prod_{C\in \mathcal{C}(\tilde{A}\cup S)}$ and $\prod_{C\in\mathcal{C}(\tilde{B}\cup S)}$ (removing overcounted cliques in $S$), and $p(x)$ can be written as a product of functions of $x_{\tilde{A}\cup S}$ and $x_{\tilde{B}\cup S}$, which is equivalent to $\tilde{A} \perp \tilde{B} ~|~ S$, implying $A \perp B ~|~ S$.
If $\forall x \in \mathcal{X},~p(x) > 0$ (i.e. if $p$ has support over the entirety of $\mathcal{X}$), then pairwise Markov implies factorisation, so in fact all of the above concepts are equivalent, and we can simply say that $p$ is **Markov** wrt. $G$.
Using [[Hammersley-Clifford Theorem]], with an arbitrary $z\in\mathcal{X}$, we write $p(x)=\prod_{j=1}^{d} \frac{p(x_{j}~|~x_{:j - 1}, z_{j+1:})}{p(z_{j} ~|~ x_{: j-1},z_{j+1:})}.$
By pairwise Markov, the $j$th term does not dependent on $x_{j'}$ unless $j \sim j'$ in $G$. Therefore, denoting the neighbours of $j$ that's indexed before it as $N_{<}(j):= \{ j'\sim j, j'<j \}$, and similarly define $N_{>}(j)$, pairwise Markov reduces the $j$th term to $\frac{p(x_{j}~|~x_{:j - 1}, z_{j+1:})}{p(z_{j} ~|~ x_{: j-1},z_{j+1:})}= \frac{p(x_{j} ~|~ x_{N_{<}(j)}, z_{N_{>}(j)})}{p(z_{j}~|~ x_{N_{<}(j)}, z_{N_{>}(j)})}=: \phi_{j}(x_{j}, x_{N_{<}(j)}),$treating $z$ as a constant. Therefore, the product becomes $p=\prod_{j}\phi_{j}(x_{j}, x_{N_{<}(j)}).$
Now assigning each term to a clique $x_{j}$ is in, $p= \prod_{C\in \mathcal{C}} \underbrace{\left( \prod_{x_{j}\text{ assigned to }C} \phi_{j}(x_{j}, x_{N_{<}(j)}) \right)}_{=: \psi_{C}},$
we conclude that $p$ factorises wrt. $G$. Here $\psi_{C}$ is a function of $x_{C}$ because all neighbours of $x$ $(x_{j},x_{N_{<}(j)}) \subset x_{C}$.
$p$ factorises wrt. a $G$ with decomposition $(A,S,B)$ if and only if $p(x_{A\cup S}),p(x_{B\cup S})$ factorises wrt. $A \cup S,B\cup S$ respectively, and $p(x_{V})\cdot p(x_{S})=p(x_{A \cup S})\cdot p(x_{B \cup S}),$i.e. $X_{A- S} \perp X_{B- S} ~|~X_{S}$.
^464932
> [!proof]-
> Any clique $C\in \mathcal{C}$ is a subset of either $A\cup S$ or $B \cup S$, but not both, since $A\perp_{s}B ~|~ S$; this has the possible exception of $S$ itself being a clique. So the factorisation can be partitioned: $p(x_{V}) =\prod_{C\in \mathcal{C}}\psi_{C}= \prod_{C\subseteq A\cup S}\psi_{C}\prod_{C\subseteq B \cup S}\psi_{C};$
> when $S$ is a clique, just assign it to one of the products. This proves conditional independence, as the first term is a function of $x_{A \cup S}$ and the second that of $x_{B \cup S}$.
>
> Now integrating over $B-S$, we have $p(x_{A \cup S})= \prod_{C\subseteq A \cup S} \psi_{C}\cdot \underbrace{\int \prod_{C\subseteq B \cup S} \psi_{C}(x_{C})~ dx_{B-S} }_{=:\tilde{\psi}(x_{S})},$
> which is a factorisation of $p(x_{A \cup S})$ once we merge $\tilde{\psi}(x_{S})$ into the potential of the clique of $G_{A \cup S}$ that contains $S$. So $p(x_{A \cup S})$ factorises over $G_{A \cup S}$.
>
> A similar proof holds for $B \cup S$.
If $p$ factorises with respect to $G$, and $G$ is decomposable with cliques $(C_{1},\dots,C_{k})$ that satisfies running intersection with separator sets $(S_{1},\dots,S_{k})$, then $p$ can be written as
$p(x)=\prod_{k} \frac{p(x_{C_{k}})}{p(x_{S_{k}})},$
^39a55a
so in the definition of factorizability, we can take $\phi_{C_{k}}:= p(x_{C_{k}}) / p(x_{S_{k}})=p(x_{C_{k}-S_{k}} ~|~ x_{S_{k}}).$
The important consequence is that we can conduct inference separately on each of $x_{C_{k}}$, as this theorem guarantees $C_{k} \perp C_{i}~|~S_{k}$ for any $i <k$. This reduces the inference over $\mathbb{R}^{| V |}$ (including interactions between cliques) to smaller inferences over $\mathbb{R}^{| C_{k} |}$.
If we have data $\mathbf{x}$ and want to make inference using a [[Contingency Table|contingency table]], the log-likelihood is $\begin{align*}
l(\theta;\mathbf{x})&= \sum_{x \in \mathcal{X}}n(x;\mathbf{x})\log p(x)\\
&= \sum_{x} n(x;\mathbf{x}) \sum_{k}\log p(x_{C_{k}-S_{k}} ~|~ x_{S_{k}})\\
&= \sum_{k} \sum_{x_{C_{k}} \in \mathcal{X}_{C_{k}}} n(x_{C_{k}}; \mathbf{x}) \log p(x_{C_{k}-S_{k}} ~|~ x_{S_{k}}).
\end{align*}$As a result, we can infer $p(x_{C_{k}})$ by looking at $n(x_{C_{k}}; \mathbf{x})$.
> [!help]- Notation and explanations
> $n(x;\mathbf{x})$ is the contingency table function that counts the number of times $x$ appeared in $\mathbf{x}$, and $\mathcal{X}_{C_{k}}$ is the sample space of the RVs indexed by $C_{k}$.
>
> The last equality follows from fixing a $x_{C_{k}}$, and considering all terms that contribute to the coefficient of $\log p(x_{C_{k}-S_{k}} ~|~ x_{S_{k}})$: it gets $+n(x';\mathbf{x})$ (for any $x' \in \mathcal{X}$) if and only if that $x'_{C_{k}}=x_{C_{k}}$, i.e. whenever $x_{C_{k}}$ appears in $\mathbf{x}$.
### Non-Decomposable Graphs
If we run the IPS algorithm on a decomposable graph, it actually completes the fitting in one iteration:
> [!theorem|*] IPS on Decomposable Graph
> Let $p$ be a distribution that is Markov wrt. a decomposable graph $\mathcal{G}$ with running intersection cliques $\mathcal{C}=(C_{1},\dots)$ and separators $\mathcal{S}=(S_{1},\dots)$.
>
> If we run the IPS algorithm that starts with $p^{(0)}= \mathrm{Unif}(\mathcal{X})$ and ask it to match $p(x_{C_{i}})$ on each $i=1,\dots$, it will converge to $p$ after one iteration, i.e. after adjusting each marginal $p(x_{C_{i}})$ exactly once.
>
> > [!proof]-
> >
> >
> > Let's denote $H_{j}:= \cup_{i \le j}C_{i}$ to be the history (i.e. all vertices that have been updated), and the vertices $V-H_{j}$ are the ones that haven't been updated. For the rest of the proof, let $\textcolor{orange}{\text{orange}=\text{uniform distribution}}$, and $\textcolor{cyan}{\text{cyan}=\text{the target }p}$.
> >
> > We shall prove that after the $j$th step, $p^{(j)}(x_{V})= \textcolor{cyan}{\prod_{i \le j} p(x_{C_{i}-S_{i}} ~|~ x_{S_{i}})} \cdot \textcolor{orange}{\mathrm{Unif}(\mathcal{X}_{V-H_{j}})}.$So it consists of the $\textcolor{orange}{\text{un-updated uniform density}}$ and $\textcolor{cyan}{\text{the correct density}}$, and hopefully all of $\textcolor{orange}{\text{orange}}$ will change into $\textcolor{cyan}{\text{cyan}}$.
> >
> > For induction, we want to prove that on the $(j+1)$th step, the IPF moves $x_{C_{j+1}-S_{j+1}}$ from the $\textcolor{orange}{\text{orange}}$ half to the $\textcolor{cyan}{\text{cyan}}$ half, i.e. it changes $\textcolor{orange}{\mathrm{Unif}(\mathcal{X}_{C_{j+1}-S_{j+1}})}$ to $\textcolor{cyan}{p(x_{C_{j+1}-S_{j+1}} ~|~ x_{S_{j+1}})}$.
> >
> > On the $(j+1)$th step, the IPF rescales the distribution by $p(x_{C_{j+1}}) / p^{(j)}(x_{C_{j+1}}),$and the denominator can be written as $p^{(j)}(x_{C_{j+1}})=\textcolor{cyan}{p^{(j)}(x_{S_{j+1}})}\cdot \textcolor{orange}{p^{(j)}(x_{C_{j+1}-S_{j+1}}~|~x_{S_{j+1}})}.$
> > Let's figure out what the two terms are (hint: the color code).
> > - Sum/integrate out the untouched vertices to get $\textcolor{cyan}{p^{(j)}(x_{H_{j}})=\prod_{i \le j}p(x_{C_{i}-S_{i}} ~|~ x_{S_{i}})=p(x_{H_{j}})},$since the middle term is a factorization of $p(x_{H_{j}})$ over the (sub)graph $\mathcal{G}[H_{j}]$. Therefore, all updated vertices had the correct marginal $p(x_{H_{j}})$.
> > - Since $S_{j} \subset H_{j}$ is among the updated vertices, it also has the correct marginal, i.e. $\textcolor{cyan}{p^{(j)}(x_{S_{j}})=p(x_{S_{j}})}$.
> > - On the other hand, all vertices in $C_{j+1}-S_{j+1}$ have not been touched, so they still have the uniform distribution with density $\textcolor{orange}{p^{(j)}(x_{C_{i+1}-S_{j+1}}| x_{S_{j+1}})= \mathrm{Unif}(\mathcal{X}_{C_{j+1}-S_{j+1}} )}$.
> >
> > Therefore, the update is $\textcolor{cyan}{\frac{p(x_{C_{j+1}})}{p(x_{S_{j+1}})}}\cdot \textcolor{orange}{\frac{1}{\mathrm{Unif}(\mathcal{X}_{C_{j+1}-S_{j+1}} )}}.$When multiplied to the density $p^{(j)}(x_{V})$, it replaces the $\textcolor{orange}{\text{un-updated}}$ density of $x_{C_{j+1}-S_{j+1}}$ with $\textcolor{cyan}{\text{the correct density}}$. This completes the induction.
> >
> > Lastly, simply let $j=| \mathcal{C} |$ to get the result after updating every clique: $p^{| \mathcal{C} |}(x_{V})=\textcolor{cyan}{\prod_{i}p(x_{C_{i}-S_{i}}~|~x_{S_{i}})}\cdot \textcolor{orange}{\mathrm{Unif}(\mathcal{X}_{\varnothing})}=\textcolor{cyan}{\prod_{i}p(x_{C_{i}-S_{i}}~|~x_{S_{i}})}=p(x_{V}).$That is, the IPF fits the target $p$ perfectly after one update of every clique.