Gibbs Sampling - Random Notes Go Brrrrrrr

Gibbs sampling is a [[Markov Chain Monte Carlo]] (MCMC) method that simulates a multidimensional Markov chain using marginal distributions. - The method simulates a Markov chain $\text{Markov}(P)$ that has the target $p$ as a stationary distribution. - It assumes that the conditional distributions are readily available, and are (relatively) easy to sample from. It targets a multidimensional variable $X=(X_{1},\dots,X_{d})\sim p$, using conditional distributions $p(X_{i}\,|\, X_{-i})$, where $X_{-i}$ has all the coordinates of $X$ except $X_{i}$. Gibbs sampling simulates the transition $X \to X'$ by *updating one coordinate at a time, according to its marginal distribution*: > [!definition|*] Gibbs Samplers > **Systematic scan Gibbs sampler** updates all coordinates in some fixed order (for example $1,2,\dots,n$) to get the new sample: $\begin{align*} \mathbf{X}_{k}=(X_{k}^{1},\dots,X_{k}^{d}) &\xrightarrow[\text{from }p(X^{1}|\mathbf{X}^{-1})]{\text{new }X^{1}} (\underline{X_{k+1}^{1}},X_{k}^{2},\dots,X_{k}^{d})\\ &\xrightarrow[\text{from }p(X^{2}|\mathbf{X}^{-2})]{\text{new }X^{2}} (X_{k+1}^{1},\underline{X_{k+1}^{2}},\dots,X_{k}^{d})\\\\ &\to \dots \\[0.4em] &\xrightarrow[\text{from }p(X^{d}|\mathbf{X}^{-d})]{\text{new }X^{d}} (X_{k+1}^{1},X_{k+1}^{2},\dots,\underline{X_{k+1}^{d}}) \equiv \mathbf{X}_{k+1}. \end{align*}$ > **Random scan Gibbs sampler** updates only one random coordinate for each new sample: $\begin{align*} \mathbf{X}_{k}=(X_{k}^{1},\dots,X_{k}^{n}) &\xrightarrow[c \in \{ 1,\dots,d \}]{\text{uniformly sample }} (X_{k}^{1},\dots,\underline{X_{k}^{c}},\dots,X_{k}^{d})\\ &\xrightarrow[\text{from }p(X^{c}|\mathbf{X}^{-c})]{\text{new }X^{c}} (X_{k}^{1},\dots,\underline{X_{k+1}^{c}}\dots,X_{k}^{d}) \equiv \mathbf{X}_{k+1}. \end{align*}$ ^a93b79 Given the coordinate $i$ to be updated, each step of Gibbs can be seen as a [[Metropolis Hastings]] update with proposal $q(\theta' ~|~ \theta):=p(\theta'_{i}~|~\theta_{-i})$: the acceptance probability is always $1$. ## Gibbs Sampling as a Markov Chain For multidimensional $\mathbf{X}$, let its state space be $\mathcal{X} \subseteq \mathbb{R}^{d}$. - The transition matrix $P$ is defined for $\Omega \times \Omega$; let it be indexed by $\mathbf{i,j}$ as in $P_{\mathbf{ij}}$. - Let $i_{c}$ denote the $c$th element of $\mathbf{i}$, and $\mathbf{i}^{-c}$ the equivalent of $\mathbf{X}^{-c}$. Then the transition kernel $K(x,x')$ can be treated as an operator on functions/densities: if $\pi(x)$ is a density, then $\pi K(y):=\int \pi(x)K(x,y) ~ dx=\int (\text{prob. of }x) \cdot (\text{prob of }x\to y)~ dx$ is a density on $y$. - Of course this assumes that $\int K(x,y) ~ dy=1$, i.e. that the transition always goes somewhere, starting from any $x$. - Then stationarity means $\pi K(y)=\pi(y)$, i.e. $\pi$ is a left-eigenvector of the operator $K$. In terms of Gibbs sampling, let $K_{j}, ~j=1,\dots,d$ be the operators of each coordinate update, i.e. $K_{j}(x,y)=\mathbf{1}\{x_{-j}=y_{-j}\}\cdot p(y_{j}~|~x_{-j})=\mathbf{1}\{x_{-j}=y_{-j}\}p(y_{j}~|~y_{-j}).$ The entire update step's kernel $K$ is then a function of those: - In systematic scan, it is $K:= K_{\sigma 1} \circ \cdots \circ K_{\sigma d}$ in the sense that $\pi K=(((\pi K_{\sigma 1} )K_{\sigma 2}) \cdots K_{\sigma d}),$i.e. applying $K_{\sigma 1},\dots, K_{\sigma d}$ on the right, in that order. - In random scan it is the mixture $K=\frac{1}{d}\sum_{j}K_{j}$ in the sense that $\pi K:= \frac{1}{d}\sum_{j}\pi K_{j}.$ We can see that the target $p(x_{1:d})$ is indeed invariant under any of the $K_{j}$: $\begin{align*} pK_{j}(y)&= \int p(x)K_{j}(x,y) ~ dx\\ &= \int p(y_{:j-1}, x_{j},y_{j+1:})\cdot p(y_{j}~|~y_{-j}) ~ dx_{j} &&[\text{the indicator function}]\\ &= p(y_{j}~|~y_{-j})\cdot p(y_{-j}) &&[\text{integrating out }x_{j}]\\ &= p(y_{j}) . \end{align*}$ Those *one-coordinate update matrices can be combined to represent a full update step in Gibbs sampling*: - They are composed in systematic scan: $P^{\text{sys}}=P^{(d)}\cdots P^{(1)}$in the order the coordinates are updated, e.g. $1,\dots,d$. - They are averaged in random scan: $P^{\text{rand}}=\frac{1}{d}\sum_{c}P^{(c)}.$ ### Detailed Balance for Gibbs Sampling *Each one-coordinate update $P^{(c)}$ satisfies the [[Markov Chain Monte Carlo#Detailed Balance|detailed balance equations]] as MCMC requires*: for states $\mathbf{i,j}$ with $\mathbf{i}^{-c}=\mathbf{j}^{-c}$ , $\begin{align*} p(\mathbf{i})P^{(c)}_{\mathbf{ij}}&= p(\mathbf{i})\cdot p(X^{c}=j_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c})\\ &= \underbrace{p(X^{c}={i}_{c}\,|\, X^{-c}=\mathbf{i}^{-c})\mathbb{P}[\mathbf{X}^{-c}=\mathbf{i}^{-c}]}_{p(\mathbf{i})}\cdot p(X^{c}=j_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c})\\ &= p(X^{c}={i}_{c}\,|\, X^{-c}=\mathbf{i}^{-c})\cdot\underbrace{\mathbb{P}[\mathbf{X}^{-c}=\mathbf{i}^{-c}] p(X^{c}=j_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c})}_{p(\mathbf{j})}\\ &= p(X^{c}=i_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c}) \cdot p(\mathbf{j})\\ &= p(\mathbf{j})P^{(c)}_{ji} \end{align*}$and if $\mathbf{i}^{-c}\ne \mathbf{j}^{-c}$, detailed balance is also satisfied with both sides being $0$. The transition matrix of *systematic scan Gibbs has the target $p$ as a stationary distribution*; it does not necessarily satisfy the detailed balance, however. - This is because the one-coordinate updates do not commute, so $P^{(1)}P^{(2)}\ne P^{(2)}P^{(1)}$ in general, for example. > [!proof] Proof of Invariance > Since $pP^{(c)}=p$ for all $c=1,\dots,d$, $pP^{\text{sys}}=p[P^{(d)}\cdots P^{(1)}]=p[P^{(d-1)}\cdots P^{1}]=\cdots=pP^{(1)}=p$so $p$ is stationary under $P^{\text{sys}}$. *The transition matrix $P^{\text{rand}}$ of random scan Gibbs satisfy the detailed balance*: $ p(\mathbf{i})P^{\text{rand}}_{\mathbf{ij}}= p(\mathbf{j})P^{\text{rand}}_{\mathbf{ji}}, $so *random scan Gibbs is both reversible and stationary wrt. $p$.* > [!proof] > Since $P^\text{rand}$ is the average of $P^{(c)}s, linearity gives: $p(\mathbf{i})P^{\text{rand}}_{\mathbf{ij}}=\frac{1}{d}\sum_{c}p(\mathbf{i})P^{(c)}_{\mathbf{ij}}=\frac{1}{d}\sum_{c}p(\mathbf{j})P^{(c)}_{\mathbf{ji}}=p(\mathbf{j})P_{\mathbf{ji}}^\text{rand}$so $P^\text{rand},p$ has detailed balance, which is sufficient for reversibility and stationarity. If the distribution does not satisfy a **positivity condition**, then Gibbs sampling may fail: ![[Screenshot 2025-02-24 at 17.15.28.png#invert|center|w60]] The condition is that the $d$-dimensional joint distribution $p(x_{1:d})=0$ if and only if any of the marginals is $0$, i.e. $\prod_{j=1}^{d}p(x_{j})=0$. - Equivalently, $\mathrm{support}(p(x_{1:d}))= \bigtimes_{j=1}^{d} \mathrm{support}(p(x_{j}))$ where $\bigtimes$ is the Cartesian product. - Satisfying this condition is sufficient for the Gibbs sampler that uses $p(x_{j}~|~x_{-i})$ to be irreducible and recurrent. ### Reverse-Engineering Joint Densities ![[Hammersley-Clifford Theorem#^9dc92d]] Using this theorem, we can fix any $z$ in the support, and find the joint distribution from the marginal distributions. - Of course, there is no guarantee that such a joint is normalizable for an arbitrary combinations of marginal distributions -- e.g. with $d=2$, the marginals $p(x_{i}~|~x_{-i}):= x_{-i}\exp(x_{i}x_{-i})$ gives $p(x_{1},x_{2})=\exp(-x_{1}x_{2})$, which is not integrable if the support includes the axes $\{ (0,x_{2}) \} \cup \{ (x_{1},0) \}$. ### References [Sample code in R simulating 2D Gaussians](https://rpubs.com/aliquis/GibbsAlgorithm).