Gibbs sampling is a [[Markov Chain Monte Carlo]] (MCMC) method that simulates a multidimensional Markov chain using marginal distributions.
- The method simulates a Markov chain $\text{Markov}(P)$ that has the target $p$ as a stationary distribution.
- It assumes that the conditional distributions are readily available, and are (relatively) easy to sample from.
It only works on multidimensional variables $\mathbf{X}=(X^{1},\dots,X^{d})$, with the density function $p(\mathbf{X})=p(X^{1},\dots,X^{d})$Gibbs sampling uses conditional distributions $p(X^{i}\,|\, \mathbf{X}^{-i})$, where $\mathbf{X}^{-i} \in \mathbb{R}^{d-1}$ has all the coordinates of $\mathbf{X}$ except $X^{i}$.
Gibbs sampling simulates the transition $\mathbf{X}_{k} \to \mathbf{X}_{k+1}$ by *updating one coordinate at a time, according to its marginal distribution*:
> [!definition|*] Gibbs Samplers
> **Systematic scan Gibbs sampler** updates all coordinates in some fixed order (for example $1,2,\dots,n$) to get the new sample: $\begin{align*}
\mathbf{X}_{k}=(X_{k}^{1},\dots,X_{k}^{d}) &\xrightarrow[\text{from }p(X^{1}|\mathbf{X}^{-1})]{\text{new }X^{1}} (\underline{X_{k+1}^{1}},X_{k}^{2},\dots,X_{k}^{d})\\
&\xrightarrow[\text{from }p(X^{2}|\mathbf{X}^{-2})]{\text{new }X^{2}} (X_{k+1}^{1},\underline{X_{k+1}^{2}},\dots,X_{k}^{d})\\\\
&\to \dots \\[0.4em]
&\xrightarrow[\text{from }p(X^{d}|\mathbf{X}^{-d})]{\text{new }X^{d}} (X_{k+1}^{1},X_{k+1}^{2},\dots,\underline{X_{k+1}^{d}}) \equiv \mathbf{X}_{k+1}.
\end{align*}$
> **Random scan Gibbs sampler** updates only one random coordinate for each new sample: $\begin{align*}
\mathbf{X}_{k}=(X_{k}^{1},\dots,X_{k}^{n}) &\xrightarrow[c \in \{ 1,\dots,d \}]{\text{uniformly sample }} (X_{k}^{1},\dots,\underline{X_{k}^{c}},\dots,X_{k}^{d})\\
&\xrightarrow[\text{from }p(X^{c}|\mathbf{X}^{-c})]{\text{new }X^{c}} (X_{k}^{1},\dots,\underline{X_{k+1}^{c}}\dots,X_{k}^{d}) \equiv \mathbf{X}_{k+1}.
\end{align*}$
^a93b79
Given the coordinate $i$ to be updated, each step of Gibbs can be seen as a [[Metropolis Hastings]] update with proposal $q(\theta' ~|~ \theta):=p(\theta'_{i}~|~\theta_{-i})$: the acceptance probability is always $1$.
## Gibbs Sampling as a Markov Chain
For multidimensional $\mathbf{X}$, let its state space be $\mathcal{X} \subseteq \mathbb{R}^{d}$.
- The transition matrix $P$ is defined for $\Omega \times \Omega$; let it be indexed by $\mathbf{i,j}$ as in $P_{\mathbf{ij}}$.
- Let $i_{c}$ denote the $c$th element of $\mathbf{i}$, and $\mathbf{i}^{-c}$ the equivalent of $\mathbf{X}^{-c}$.
Then the transition kernel $K(x,x')$ can be treated as an operator on functions/densities: if $\pi(x)$ is a density, then $\pi K(y):=\int \pi(x)K(x,y) ~ dx=\int (\text{prob. of }x) \cdot (\text{prob of }x\to y)~ dx$ is a density on $y$.
- Of course this assumes that $\int K(x,y) ~ dy=1$, i.e. that the transition always goes somewhere, starting from any $x$.
- Then stationarity means $\pi K(y)=\pi(y)$, i.e. $\pi$ is a left-eigenvector of the operator $K$.
In terms of Gibbs sampling, let $K_{j}, ~j=1,\dots,d$ be the operators of each coordinate update, i.e. $K_{j}(x,y)=\mathbf{1}\{x_{-j}=y_{-j}\}\cdot p(y_{j}~|~x_{-j})=\mathbf{1}\{x_{-j}=y_{-j}\}p(y_{j}~|~y_{-j}).$
The entire update step's kernel $K$ is then a function of those:
- In systematic scan, it is $K:= K_{\sigma 1} \circ \cdots \circ K_{\sigma d}$ in the sense that $\pi K=(((\pi K_{\sigma 1} )K_{\sigma 2}) \cdots K_{\sigma d}),$i.e. applying $K_{\sigma 1},\dots, K_{\sigma d}$ on the right, in that order.
- In random scan it is the mixture $K=\frac{1}{d}\sum_{j}K_{j}$ in the sense that $\pi K:= \frac{1}{d}\sum_{j}\pi K_{j}.$
We can see that the target $p(x_{1:d})$ is indeed invariant under any of the $K_{j}$: $\begin{align*}
pK_{j}(y)&= \int p(x)K_{j}(x,y) ~ dx\\
&= \int p(y_{:j-1}, x_{j},y_{j+1:})\cdot p(y_{j}~|~y_{-j}) ~ dx_{j} &&[\text{the indicator function}]\\
&= p(y_{j}~|~y_{-j})\cdot p(y_{-j}) &&[\text{integrating out }x_{j}]\\
&= p(y_{j}) .
\end{align*}$
Those *one-coordinate update matrices can be combined to represent a full update step in Gibbs sampling*:
- They are composed in systematic scan: $P^{\text{sys}}=P^{(d)}\cdots P^{(1)}$in the order the coordinates are updated, e.g. $1,\dots,d$.
- They are averaged in random scan: $P^{\text{rand}}=\frac{1}{d}\sum_{c}P^{(c)}.$
### Detailed Balance for Gibbs Sampling
*Each one-coordinate update $P^{(c)}$ satisfies the [[Markov Chain Monte Carlo#Detailed Balance|detailed balance equations]] as MCMC requires*: for states $\mathbf{i,j}$ with $\mathbf{i}^{-c}=\mathbf{j}^{-c}$ , $\begin{align*}
p(\mathbf{i})P^{(c)}_{\mathbf{ij}}&=
p(\mathbf{i})\cdot p(X^{c}=j_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c})\\
&= \underbrace{p(X^{c}={i}_{c}\,|\, X^{-c}=\mathbf{i}^{-c})\mathbb{P}[\mathbf{X}^{-c}=\mathbf{i}^{-c}]}_{p(\mathbf{i})}\cdot p(X^{c}=j_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c})\\
&= p(X^{c}={i}_{c}\,|\, X^{-c}=\mathbf{i}^{-c})\cdot\underbrace{\mathbb{P}[\mathbf{X}^{-c}=\mathbf{i}^{-c}] p(X^{c}=j_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c})}_{p(\mathbf{j})}\\
&= p(X^{c}=i_{c}\,|\,\mathbf{X}^{-c}=\mathbf{i}^{-c}) \cdot p(\mathbf{j})\\
&= p(\mathbf{j})P^{(c)}_{ji}
\end{align*}$and if $\mathbf{i}^{-c}\ne \mathbf{j}^{-c}$, detailed balance is also satisfied with both sides being $0$.
The transition matrix of *systematic scan Gibbs has the target $p$ as a stationary distribution*; it does not necessarily satisfy the detailed balance, however.
- This is because the one-coordinate updates do not commute, so $P^{(1)}P^{(2)}\ne P^{(2)}P^{(1)}$ in general, for example.
> [!proof] Proof of Invariance
> Since $pP^{(c)}=p$ for all $c=1,\dots,d$, $pP^{\text{sys}}=p[P^{(d)}\cdots P^{(1)}]=p[P^{(d-1)}\cdots P^{1}]=\cdots=pP^{(1)}=p$so $p$ is stationary under $P^{\text{sys}}$.
*The transition matrix $P^{\text{rand}}$ of random scan Gibbs satisfy the detailed balance*: $
p(\mathbf{i})P^{\text{rand}}_{\mathbf{ij}}= p(\mathbf{j})P^{\text{rand}}_{\mathbf{ji}},
$so *random scan Gibbs is both reversible and stationary wrt. $p$.*
> [!proof]
> Since $P^\text{rand}$ is the average of $P^{(c)}