> [!bigidea] > Metropolis-Hastings indirectly simulates the suitable Markov chain that converges to the target distribution. Consider the goal of running a [[Markov Chain Monte Carlo]] $X_{1},\dots \sim \mathrm{Markov}(P)$ with transition matrix $P$ that has the target distribution $p$ as its stationary distribution. **Metropolis-Hastings (MH)** indirectly simulates the transition $X_{k}\to X_{k+1}$ by rejection sampling: a proposal distribution $q$ gives the next state, which is accepted/rejected: $[X_{k}=x] \xrightarrow[\text{by }q]{\text{proposal}}[X_{k+1}=x'\,?]\to\begin{cases} \text{accept}\to [X_{k+1}=x'] \\ \text{reject }\,\to [X_{k+1}=X_{k}=x] \end{cases}$where *the current state is carried over if the proposal is rejected*. The **proposal distribution** in M-H is like a Markov chain, providing the probability of going from state $i$ to state $j$: for a discrete state space $\mathcal{X}$, let $q(x' ~|~ x):=\mathbb{P}[\text{propose }X_{k+1}=x'\,|\,X_{k}=x]$ where *a good proposal should be irreducible* (so every state gets a chance to be sampled), and the simplest would be the uniform $q=\mathrm{Unif}(\mathcal{X})$. ## Deriving the Algorithm ### Finding the Acceptance Probability in Discrete Chains > [!bigidea] > Given a proposal $q$, we need to choose the acceptance probability that simulates the desired Markov chain that has $p$ as stationary distribution. As in rejection sampling, there is an **acceptance probability**: $A(x'~|~x)=\mathbb{P}[\text{proposal accepted}\,|\,X_{k+1}=x' \text{ proposed},\,X_{k}=x],$constant for any $k \ge 0$, and the key is to *choose the acceptance probability that simulates the suitable Markov chain*, namely the chain that has the target $p$ as a stationary distribution. - Note that $p$ itself is often not known, and the algorithm can only use $\tilde{p}\propto p$. Let the discrete state space be indexed with $i,j$: > [!exposition] Solving Detailed Balance in Metropolis-Hastings > This algorithm simulates the Markov chain with transition matrix $P$ given by $\begin{align*} > P_{ij}&= \mathbb{P}[X_{k+1}=j\,|\,X_{k}=i]\\[0.2em] > & \begin{split} > {}=\mathbb{P}[X_{k+1}&=j \text{ proposed \& accepted}\,|\,X_{k}=i]\\&+\mathbb{1}_{i=j}\cdot\mathbb{P}[\text{rejection}\,|\, X_{k}=i] > \end{split}\\[0.2em]&= q(j~|~i)A(j ~|~ i)+\mathbb{1}_{i=j}\left(1- \sum_{k \in I}q(k~|~i)A(k ~|~ i) \right) > \end{align*}$ > To find the suitable $A$, recall that the detailed balance equations are sufficient for stationarity, so $\text{Markov}(P)$ and $p$ should satisfy: (assuming $i \ne j$, since otherwise it's trivial) $ \begin{align*} > p_{i}P_{ij}&= p_{j}P_{ji}\\[0.2em] > \tilde{p}_{i}\cdot q(j ~|~ i)\cdot A(j ~|~ i)&= \tilde{p}_{j} \cdot q(i~|~j)\cdot A(j ~|~ i)\\[0.2em] > \frac{A(i ~|~ j)}{A(j ~|~ i)}&= \frac{\tilde{p}_{j}}{\tilde{p}_{i}}\cdot \frac{q(i~|~j)}{q(j~|~i)}. > \end{align*}$It is easy to verify that the following definition gives a valid ($\le1$) probability that satisfies the equation. > [!definition|*] Metropolis Hastings Acceptance Probability > When the current state is $i$ and the proposal is $j$, Metropolis-Hastings accepts the proposal with probability $A(j ~|~ i)=\min\left(1,\,\frac{\tilde{p}_{j}}{\tilde{p}_{i}}\cdot \frac{q(i~|~j)}{q(j~|~i)}\right).$ - Note that if $A(j ~|~i)=\frac{\tilde{p}_{i}}{\tilde{p}_{j}}\cdot \frac{q(i|j)}{q(j|i)}$, then $A(i ~|~ j)=1$, and vice versa (but with the inverse of the fraction) -- one of the directions is always accepted. ### M-H in Continuous State Space In a continuous state space, we can roughly copy what we did for discrete spaces by replacing $\mathbb{P}$ with the corresponding densities: retaining the notation $x' ~|~x$ for the (proposed) transition $x \to x'$, define the proposal $q(x' ~|~ x)$ and acceptance probability $A(x' ~|~ x):= \min\left( 1, \frac{p(x')q(x ~|~ x')}{p(x) q(x' ~|~ x)} \right).$ Also define the rejection probability $R(x):= \int q(x' ~|~ x) (1-A(x' ~|~ x)) ~ dx'$. The transition $x \to x'$ has conditional density $\begin{align*} K(x' ~|~ x)&:= \mathbb{P}[x' \text{ accepted} ~|~ x]\cdot q(x' ~|~ x) + \mathbb{P}[x' \text{ rejected}]\cdot \delta_{x}(x'),\\ &= A(x' ~|~ x) \cdot q(x' ~|~ x) + R(x)\cdot \delta_{x}(x') \end{align*}$i.e. a mixture between the acceaccepted pted proposal $AAq$ and the Dirac delta at $x$, $\delta_{x}$. Note that the atom has mass $R(x)$ not $1-A(x' ~|~ x)$, since other rejected proposals $x'' \ne x'$ also contribute. In differential notation, we can define the **kernel** $K(dx' ~|~ x)=A(x' ~|~ x)q(x' ~|~ x)dx'+R(x)\delta_{x}(x')dx'$ where by "kernel" we mean that $K$ is the measure satisfying $\mathbb{P}[X_{t+1} \in A ~|~ X_{t}=x]=\int _{A}K(dx'~|~ x) $ for any $A \subset \mathcal{X}$. Now we need to verify that this tempting definition satisfies the continuous state detailed balance: in differential notation, $K(dx'~|~x)p(dx) \overset{?}{=}K(dx ~|~ x')p(dx')$in the sense that the two are equal under $\int_{x \in A}\int_{x' \in B}$ for any $A,B \subset \mathcal{X}$. Though on both sides, the differentials are $dx'~dx$, so we simply check if $K(x' ~|~ x)\cdot p(x)\overset{?}{=}K(x ~|~ x')p(x').$ > [!proof] Proof of continuous M-H detailed balance > We verify this by separating $K$ into its two terms: > > For the continuous component, we WLOG assume $A(x' ~|~ x) < 1$ (so $A(x ~|~ x')=1$), and the left hand side has $A(x' ~|~ x)\cdot q(x' ~|~ x)\cdot p(x)=\frac{p(x')q(x ~|~ x')}{\cancel{p(x)q(x' ~|~ x)}} \cancel{q(x' ~|~ x) p(x)}=p(x')q(x ~|~ x')\cdot 1,$which is exactly the right hand side $p(x')\cdot q(x ~|~ x') \cdot A(x')= p(x') \cdot K(x ~|~ x')$. > > For the discrete component (the one with the delta function), we verify that the two sides are indeed equal under any $\int_{x \in A}\int_{x' \in B}$: note that $p\cdot R$ is constant for one of the integrals, so $ \int_{x \in A}\int_{x' \in B} \mathrm{LHS}= \int _{x\in A} p(x)R(x) \int _{x' \in B}\delta_{x}(x')= \int _{x \in A} p(x)R(x)\mathbf{1}_{x \in B}=\int _{A \cap B} p(x) R(x),$and the same for $\mathrm{RHS}$ (except for the dummy variable being $x'$ instead of $x$). ### The Metropolis-Hastings Algorithm The following algorithm works for both discrete and continuous state spaces, where $A(j ~|~ i):= A_{ij}$ for discrete ones. > [!algorithm] Metropolis-Hastings > > With $A_{ij}$ given above, the **Metropolis-Hastings Algorithm** simulates $X \sim p$ with $X_{0},\dots,X_{n} \sim \text{Markov}$ by: > > Initialize with some fixed value $X_{0}=x_{0}$ or distribution $X_{0} \sim \lambda$. > > For $[k=0,\dots, N]$: > - Given $X_{k}=x$, sample a proposal $X_{k+1}\overset{?}{=}x'$ from the proposal distribution $q(x'~|~x)$. > - Sample $U \sim U[0,1]$, and accept the proposal iff $U \le A(x'~|~x)$. > - If accepted, set $X_{k+1}:=x'$. Otherwise, retain the current state: $X_{k+1}:=X_{k}=x$. > > Return the samples $X_{1},\dots,X_{n}$. For application in MCMC, let $X_{1},\dots,X_{n}$ be the samples from by Metropolis-Hastings to simulate $X \sim p$. By construction of $P$ from $(A_{ij})$ and $q(i|j)$, $P$ has $p$ as a stationary distribution. - Then by standard DTMC/MCMC results, the Monte Carlo estimator using $X_{1},\dots,X_{n}$ is strongly consistent: $\hat{\theta}_{n}=\frac{1}{n}\sum^{n}_{k=1}\phi(X_{k})\to \theta=\mathbb{E}[\phi(X)] \,\,\mathrm{a.s.}$or with a burn-in phase of $b$ samples, $\hat{\theta}_{n}^{*}=\frac{1}{n-b}\sum^{n}_{k=b+1}\phi(X_{k}).$ ## Practical Modifications The choice of proposal greatly affects the efficiency of sampling (e.g. how fast the [[Effective Sample Size]] grows over time). In particular, we wish to balance *exploring the state space* and *not getting rejected too often*. - A proposal that barely nudges $X_{k}$ will be frequently accepted, but the samples will be highly correlated and it fails explore the state space. - On the other extreme, proposing wild updates will be rejected left and right, and the chain will just stay in one place. Research suggests a rejection rate (averaged over the proposals) of $0.234$ is good, though it can be higher ($0.44$) for low-dimensional chains. ### Proposals Basic examples include: - Symmetric random walk $X_{k+1}:=X_{k}+\epsilon_{k},$ where $\epsilon_{k} \overset{\mathrm{iid.}}{\sim} g(\eta)$ for some symmetric $g$ and hyperparameter $\eta$. The acceptance probability is $\min\{ 1, p(x') / p(x) \}$ due to the symmetry of $g$; we always accept a more likely state. - Independent proposals $X_{k+1} \sim q(x')$ that does not depend on $X_{k}=x$; for example, a uniform distribution over the state space when it's finite. Hyperparameters of the proposal, e.g. the scale $\sigma$ of $\epsilon \sim N(0, \sigma^{2})$, can be tuned to get the ideal rejection rate. Weirder proposals can be more adaptive, for example letting the proposal parameters $\eta=\eta(X_{k})$ to depend on the current state. The Metropolis-Adjusted Langevin Algorithm (**MALA**) uses proposals of the form $X_{k+1}= X_{k}+\frac{\sigma^{2}}{2}\nabla \log p(X_{k})+\sigma Z, $ which adds a smarter nudge of along $\nabla \log p$, which increases the acceptance probability of the proposal. It helps in high-dimensional distributions, where a simple, symmetric random walk is unlikely to pick the right direction to move. ### Proposals with Transformations Consider a proposal algorithm > [!algorithm] M-H Proposal with Transformation > Draw the **proposal variable** $u\sim q_{U}(u)$ over some set $\mathcal{U}$. > Compute $x':= \phi_{1}(x,u)$ for some **proposal function** $\phi_{1}:\mathcal{X}\times \mathcal{U} \to \mathcal{X}$. > Return $x'$ as the proposal. - For example, we can have $u \sim N(0, \sigma^{2}_{U})$, and add it to the current state with $\phi_{1}(x,u)=x+u=: x'$. - It's possible to have $q_{U}$ and $\mathcal{U}$ depend on $x$, but it doesn't really affect the derivations below, so we can replace $q_{U}(u)$ we $q_{U}(u~|~x)$ if necessary. - This induces the implicit proposal $q(x'~|~x)$ that satisfies $q(dx' ~|~ x)=g_{U}(du)$, in the sense that $\int _{x'\in A} q(x'~|~ x)~ d x' =\int _{u: ~\phi(x,u)\in A} g_{U}(u)~ du, $for any measurable $A\subset \mathcal{X}$. Now it remains to find the Metropolis-Hastings acceptance probability $\alpha(x' ~|~ x)$ to guarantee detailed balance: $p(x)\cdot q(x' ~|~ x)\cdot \alpha(x' ~|~ x)~dx~dx' \overset{?}{=}p(x')\cdot q(x ~|~ x')\cdot \alpha(x ~|~ x')~dx~dx'.$ > [!exposition] Deriving the Acceptance Probability with Transformations > Unfortunately the above is useless since we do not know the implied proposal $q(x'~|~x)$, so how do we use the fact that $q(x'~|~x)~dx'=q_{U}(du)$? We can substitute it in $\mathrm{LHS}$, but how do we define some $u'$ to replace $q(x~|~x')~ dx$ on $\mathrm{RHS}$? > > We can define $u':=\phi_{2}(x,u)$ to be solution to $x=\phi(x',u')$ where $x':= \phi(x,u)$, giving an **involution** on $(\mathcal{X}\times \mathcal{U}) \to(\mathcal{X}\times \mathcal{U})$ given by > $\phi(x,u):=(x',u'),$ > with the property that $\phi(x',u')=(x,u)$, or equivalently, that $\phi \circ \phi$ is the identity map. > This involution property allows us to write $q(dx~|~x')=q_{U}(du')$ too, so substituting it in $\mathrm{RHS}$, the detailed balance becomes $p(x)\cdot q_{U}(du)\cdot \alpha(x',u' ~|~ x,u)~dx\overset{?}{=}p(x')\cdot q_{U}(du')\cdot \alpha(x,u ~|~x', u')~dx',$where we now allow $\alpha$ to depend on $u,u'$ too. > > But now the differentials are different ($\mathrm{LHS}$ is $du~dx$, $\mathrm{RHS}$ is $du'~dx'$), so we use the transformation > $\Big| \underbrace{\det\frac{ \partial (x',u') }{ \partial (x,u) } }_{=: J(x,u)}\Big|dx~du=dx'du',$ > > where $\frac{ \partial (x',u') }{ \partial (x,u) }$ is the Jacobian matrix of $\phi(x,u)$. Using this on $\mathrm{RHS}$, the detailed balance now looks like > $p(x)\cdot q_{U}(du)\cdot \alpha(x',u' ~|~ x,u)~dx\overset{?}{=}p(x')\cdot q_{U}(du)\cdot \alpha(x,u ~|~x', u')\cdot J(x,u) ~dx,$ > > "Solving" for $\alpha$, or rather verifying the formula defined below, we must have > $\frac{\alpha(u',x' ~|~ u, x)}{\alpha(u,\theta ~|~ u', \theta')}=\frac{p(x')q_{U}(u')}{p(x)q_{U}(u)}\cdot J(x,u).$ > Therefore, we can imitate the vanilla Metropolis-Hastings to define: > [!definition|*] Acceptance probability in M-H with Transformations > $\alpha(x',u' ~|~ x,u):= \min \left( 1, \frac{p(x')q_{U}(u')}{p(x)q_{U}(u)}\cdot J(x,u) \right).$ Given this form, verifying detailed balance is much easier: $\begin{align*} p(dx)q(dx' ~|~ x) \alpha(x' ~|~ x)&= p(dx) q_{U}(du) \alpha(x', u' ~|~ x, u)\\ &= \min( p(x) q_{U}(u), p(x')q_{U}(u')\cdot J(x,u)) \cdot dx du\\ &= \min\left( \frac{p(x)q_{U}(u)}{p(x') q_{U}(u')} J(x,u)^{-1}, 1 \right)\cdot p(x') q_{U}(u')dxdu. \end{align*}$ But the first term is exactly $\alpha(x, u ~|~ x', u')$, because *the Jacobian of the inverse map is the inverse of the Jacobian map*: $J(x', u')= \frac{1}{\text{Jacobian of }\phi^{-1}(x,u)}=\frac{1}{\text{Jacobian of }\phi(x,u)}=\frac{1}{J(x,u)}.$ > [!question]- Questions? > Now how does this relate to vanilla M-H, you may ask: it is just $x':=u$ with proposal $q_{U}(u~|~x) = q(x'~|~x)$. The involution is $\phi(x,x')=(x',x)$ with Jacobian $\begin{pmatrix}0 & 1 \\1 & 0\end{pmatrix}$ (so the $| J |=1$ is not explicitly shown in the acceptance probability). > > You might also wonder why the numerator of $\alpha$ is prime-times-prime $p(x')q_{U}(u')$, unlike in vanilla M-H, where the numerator is prime-times-no-prime $p(x')\cdot q(x ~|~ x')$. This is because $u'=\phi_{2}(x,u)$ is just $x$ in disguise. > [!examples] Example: random walk on a log scale > Consider using $U$ as a scaling factor: say $U \sim \mathrm{Unif}[2^{-1}, 2]$, and the proposal function is $x':=\phi_{1}(x, u)=ux,$ > and we set $\phi_{2}(x,u)=u^{-1}$ to make $\phi$ an involution. > > The Jacobian of the map is $J(x,u)= \left|\det \begin{pmatrix} u & x \\ 0 & -u^{-2} \end{pmatrix}\right|=\frac{1}{u}.$ > Therefore, the acceptance probability is $\alpha(x', u' ~|~ x, u)=\min\left( 1, \frac{p(x')}{p(x)}\cdot \frac{1}{u} \right).$ ### Working with M-H Kernels The M-H kernel $K(dx'~|~x)$ is just a measure, so we can do all sorts of measure-y things to it: - Linear mixtures $\sum_{j=1}^{p}w_{j}K_{j}(dx'~|~x)$ with weights $w=(w_{1},\dots,w_{p})$. - Products $\prod_{j=1}^{p} K_{j}(dx_{j}~|~x_{j-1})$, i.e. using the output of $K_{j}$ as the input of $K_{j+1}$. The linear mixture corresponds to the following update: $\begin{align*} &\text{Sample kernel index }&j \sim \mathrm{Unif}\{ 1,\dots,p \}\\ &\text{Apply kernel}&x' \sim q_{j}(x' ~|~ x)\\ &\text{Accept iff. } & U[0,1] \le \alpha_{j}(x' ~|~ x) \end{align*}$ where $q_{j},\alpha_{j}$ are the proposal and acceptance probability that defines the kernel $K_{j}$. - Note that this is not equivalent with sampling from $\sum_{j}w_{j}q_{j}(x'~|~x)$, since here $\alpha_{j}$ depends on the kernel, while sampling from the mixture proposal will use the same $\alpha$ regardless of which proposal is used. In other words, *kernel of the mixture is not the mixture of kernels*. The product kernel corresponds to applying the kernels one-by-one: $\begin{align*} \text{For }&\text{kernel index }j = 1,\dots,p :\\ &\text{Apply kernel: }\\ &~~~~~~~~\text{propose }z \sim q_{j}(z ~|~ x)\\ &~~~~~~~~\text{accept }x \leftarrow z~\text{ iff }~ U[0,1] \le \alpha_{j}(x' ~|~ x) \end{align*}$ note that here, $x\leftarrow z$ sampled in one iteration will be used as the "from" state when applying the next one. One special case (for multidimensional $x$) is when the $j$th kernel only updates the $j$th coordinate: $K_{j}(x'~|~ x)=q_{j}(x_{j}'~|~x)\cdot \delta_{x_{-j}}(x'_{-j}),$i.e. sampling from some 1-dimensional proposal $x'_{j} \sim q_{j}(\cdot ~|~ x_{-j})$ while keeping the other coordinates $x_{-j}=x_{-j}'$ unchanged. Composing/mixing those kernels produce **one-at-a-time M-H**. When we use the conditional distribution $q_{j}(x'_{j}~|~x)=p(x_{j}~|~x_{-j})$ as proposal, we recover the [[Gibbs Sampling|Gibbs sampler]]. Their linear mixture is called random scan Gibbs, and the composition/product is called systematic scan Gibbs. ### Random Proposals Consider using a proposal parametrised by $\theta$, which is also drawn randomly at each step: $\theta \sim q_{\theta}(\theta),~~ X ~|~ \theta \sim q_{X}(x ~|~ \theta).$ Here the dependence on the previous state is suppressed. The marginal proposal is then $q(x):= \int q_{X}(x ~|~ \theta)q_{\theta}(\theta)d\theta$, which we assume to be intractable. Therefore, we cannot evaluate the usual MH acceptance probability. Instead, consider replacing $q(x)$ with $q_{X}(x ~|~ \theta)$, giving the transition probability between $(x, \theta) \leftrightarrow (x’, \theta’)$ in the augmented space: $\alpha(x’, \theta’ ~|~ x, \theta):= \min \left\{ 1, \frac{p(x’)q_{X}(x ~|~ \theta)}{ p(x) q_{X}(x’ ~|~ \theta’)} \right\}.$ Does this generate the correct chain, i.e. does this chain’s stationary distribution has $p(x)$ as the marginal of $X$? Converting the conditional $q_{X}(x~|~ \theta)$ into joints $q(x, \theta):= q_{\theta}(\theta)q_{X}(x ~|~ \theta)$, the above becomes $\min \left\{ 1, \frac{p(x’)q_{\theta}(\theta’) \cdot q(x, \theta)}{p(x) q_{\theta}(\theta) \cdot q(x’, \theta’)} \right\},$ Which is the acceptance probability of sampling from $\tilde{p}(x, \theta):= p(x) q_{\theta}(\theta)$ with the proposal $q(x, \theta)$. Therefore, the chain follows $\tilde{p}$, and the marginal of $X$ is $p(x)$ as desired. ### Random Targets: Pseudomarginals Consider sampling from the marginal distribution $p(x)=\int p(x,u) ~ du$, where we cannot solve the intractable integral. Instead, we can sample some unbiased approximation $\hat{p}(x): \mathbb{E}[\hat{p}(x)]=p(x).$ For example, $\hat{p}(x)=\sum_{j} p(x,U_{j})$ is an unbiased approximation, where $U_{j} \overset{\mathrm{iid.}}{\sim} p(u) = \int p(x,u) dx$. It is tempting to replace $p$ with $\hat{p}$ in the M-H acceptance probability, i.e. $\hat{\alpha}(x'~|~x)=\min\left\{ 1, \frac{\hat{p}(x')q(x~|~x')}{\hat{p}(x)q(x'~|~x)} \right\}.$Does this produce the right chain? Consider the variable $z:= \hat{p}(x) / p(x)$, which is random because $\hat{p}$ is, with some (unknown) distribution $p(z~|~x)$. Then the acceptance probability can be written as $\hat{\alpha}(x',z'~|~x,z)=\min\left\{ 1, \frac{p(x')z'\cdot q(x~|~x')}{p(x)z\cdot q(x'~|~x)} \right\}.$ This looks like an M-H algorithm targeting some joint distribution $(x,z) \sim \pi$ -- indeed, if we pretend the proposal first sample $x'$ from $q(x'~|~x)$, then propose $z'$ from $p(z'~|~x')$, the acceptance probability is $\hat{\alpha}(x',z'~|~x,z)= \min \left\{ 1, \frac{\pi(x',z')\cdot q(x~|~x')\cdot p(z ~|~ x)}{\pi(x,z)\cdot q(x'~|~x)\cdot p(z'~|~ x')} \right\}.$To equate the two, we simply set $\pi(x,z)=p(x)\cdot z \cdot p(z~|~ x)$. In conclusion, the acceptance probability $\hat{\alpha}$ simulates a M-H targeting $(x,z)\sim \pi$ using proposals $x'~|~x\sim q,~z'~|~x'\sim p(z'~|~x')$. This means that the $xs it samples follows the marginal under $\pi$, namely $\pi(x):= \int \pi(x,z) ~ dz=\int p(x)\cdot(zp(z~|~x)) ~ dz =p(x), $as $\mathbb{E}[z~|~x]=\mathbb{E}[\hat{p}(x) / p(x)]=1$ due to the unbiasedness of $\hat{p}(x)$. Therefore, this chain indeed samples from the unknown marginal $p(x)$, without us needing to compute it. > [!algorithm] Pseudomarginal Metropolis-Hastings > Initialise with some $X_{0}$. > > For $[k=1,\dots]$: > - Sample the proposal $\tilde{X}_{k} = q(\cdot ~|~ X_{k-1})$. > - Sample the pseudomarginal $\hat{p}_{k}$. > - Accept the proposal with probability $\hat{\alpha}_{k}(\tilde{X}_{k}~|~X_{k-1})=\min\left\{ 1, \frac{\hat{p}_{k}(\tilde{X}_{k})q(X_{k-1}~|~\tilde{X}_{k})}{\hat{p}_{k}(X_{k-1})q(\tilde{X}_{k}~|~X_{k-1})} \right\}.$ > > - If accepted, set $X_{k} = \tilde{X}_{k}$. > - If rejected, set $X_{k}=X_{k-1}$. ## References [This excellent video](https://www.youtube.com/watch?v=yCv2N7wGDCw). [Sample code in R simulating a chi-squared distribution](https://rpubs.com/aliquis/MHalgorithm).