Markov Chain Monte Carlo - Random Notes Go Brrrrrrr

Markov Chain [[Monte Carlo Integration|Monte Carlo]] (MCMC) estimates $\theta = \mathbb{E}[\phi(X)]$ by simulating a [[Discrete Time Markov Chains|DTMC]] $X_{1},X_{2},\dots \sim \text{Markov}(P)$ as samples of $X \sim \lambda$, where $\lambda$ is some target distribution supported on the state space $I$. - $P$ is the transition matrix, chosen so that the samples converge to $p$. - This course only covers MCMC with a discrete (finite or countable) $I$, which can be handled with a transition matrix. Continuous variables need more general Markov chains. - As usual, $\lambda=\tilde{\lambda} / c_{p}$, where the factor $\tilde{p}$ is known, but the normalizing constant $c_{p}$ is not. *The MCMC samples $X_{1},X_{2},\dots$ are correlated, and eventually converge to the target distribution*, in contrast to previous algorithms, which use iid. samples from either the target or the proposal distribution. ### Reversibility in MCMC > [!bigidea] If a transition matrix $P$ is "reversible" wrt. a distribution $\lambda$, then $\lambda$ is a stationary distribution under $P$, and samples from $\mathrm{Markov}(P)$ can be seen as samples from $\lambda$. By a general result about reversibility, ![[Time Reversal of Markov Chains#^9fc980]] and detailed balance implies stationarity: ![[Time Reversal of Markov Chains#^e47297]] Therefore, if we can simulate a chain $\mathrm{Markov}(P)$ that is reversible / in detailed balance wrt. the target $\lambda$, the samples $X_{1},\dots \sim \mathrm{Markov}(P)$ will converge to $\lambda$ in distribution. - Reversible chains are commonly used in MCMC, though not in all algorithms (e.g. systematic scan Gibbs sampling). ### The MCMC Estimator > [!idea] Given a suitable Markov chain, MCMC gives a consistent though biased estimate. Given samples $(X_{k})$ from a Markov chain $\text{Markov}(P)$, MCMC estimator for $\theta$ is the sample average $\hat{\theta}_{n}=\frac{1}{n}\sum^{n}_{k=1}\phi(X_{k}).$Recall from [[A8 Probability|A8]] that if the irreducible and aperiodic $\text{Markov}(P)$ has stationary distribution $p=(p_{1},\dots)$, then for any initial distribution, $\forall i,j$, as $n \to \infty$, *the MCMC converges to the target distribution*: $\begin{align*} \mathbb{P}(X_{n}=i)\to p_{i}\\ p_{ij}^{(n)}\to p_{j} \end{align*}$ *Consistency of MCMC*: if the transition matrix $P$ is irreducible and aperiodic, and has the target distribution $p$ as a stationary distribution, then $X_{k} \xrightarrow{d}X \sim p$, and by strong law of large numbers, $\hat{\theta}$ is consistent: $\hat{\theta}_{n} \to \theta\,\,\mathrm{a.s.}$However, MCMC is biased for any finite $n$. In MCMC, the Markov chain $(X_{1},\dots)$ needs a **burn-in phase** to converge, during which the samples are not representative of the target $p$: $\underbrace{X_{1},\dots,X_{b}}_{\text{Burn-in Phase}},\,\,\underbrace{X_{b+1},\dots,X_{n}}_{\substack{\text{Representative} \\ \text{Samples}}}$The burn-in phase can be long (e.g. $b > 1000$), so it is common to discard those burn-in samples and estimate from the more representative remainder: $\hat{\theta}^{*}=\frac{1}{n-b}\sum^{n}_{k=b+1}\phi(X_{k})$