Reversible Jump MCMC - Random Notes Go Brrrrrrr

Consider a collection of models $\mathcal{M}$, where each model $m \in \mathcal{M}$ has its own set of parameters $\theta \sim \pi(\theta ~|~ m)$, but the dimensions of $\theta ~|~ m$ are in general different, denoted $p_{m}$. - For example, a regression setting $Y \sim X^{T}\theta$ where $\theta$ performs variable selection, so $p_{m}$ is the number of features selected by $m$. We wish to use MCMC to sample from the distribution $\pi(\theta,m)$, so our samples are $X^{(t)}=(\theta^{(t)}, m^{(t)})$. This makes traditional MCMC difficult, since they assume the sample space $\Omega$ to have a constant dimension for reversibility, but the coefficients $\theta^{(t)}$ has dimension $p_{m^{(t)}}$, which changes over time: $\theta^{(t)}=\begin{bmatrix} \theta^{(t)}_{1} \\ \vdots \\ \theta^{(t)}_{p_{m^{(t)}}} \end{bmatrix} \xrightarrow{???} \begin{bmatrix} \theta^{(t+1)}_{1} \\ \vdots \\ \vdots \\ \theta^{(t+1)}_{p_{m^{(t+1)}}} \end{bmatrix}= \theta^{(t+1)} $ To solve the issue, to jump from $(\theta ,m)$ to $(\theta',m')$, RJMCMC pads $\theta$ and $\theta'$ with auxiliary variables $U,U'$ so that their dimensions are comparable, then makes the jump with **proposal function** $\phi_{m,m'}$: $\theta', u':=\phi_{m,m'}(\theta,u),$ and we require the family $\{ \phi_{m,m'}\}$ to be **involutions**, in the sense that $\phi_{m',m}(\theta',u')=\theta,u$ holds at the same time, i.e. mapping twice will recover the original model's parameters. From now on we drop the subscript and write $\phi$. So given a proposed jump $m\to m'$, we have $\theta=\begin{bmatrix} \theta_{1} \\ \vdots \\ \theta_{p_{m}} \end{bmatrix} \xrightarrow{\text{sample }U} \begin{array}{} \begin{bmatrix} \theta_{1} \\ \vdots \\ \theta_{p_{m}} \end{bmatrix} \\ \begin{bmatrix} U_{1} \\ \vdots \\ \vdots \\ U_{\tilde{p}_{m}} \end{bmatrix} \end{array} \xleftrightarrow[\phi_{m',m}]{~~~\phi_{m,m'}~~}\begin{array}{} \begin{bmatrix} \theta'_{1} \\ \vdots \\ \vdots \\ \theta'_{p_{m'}} \end{bmatrix} \\ \begin{bmatrix} U'_{1} \\ \vdots \\ U'_{\tilde{p}_{m}} \end{bmatrix} \end{array} \xrightarrow{\text{drop }U'}\begin{bmatrix} \theta'_{1} \\ \vdots \\ \vdots \\ \theta'_{p_{m'}} \end{bmatrix}=\theta',$ where $U,U'$ have dimensions $\tilde{p}_{m},\tilde{p}_{m'}$ respectively, so that the padded dimensions are equal on both sides, with $p_{m}+\tilde{p}_{m}=p_{m'}+\tilde{p}_{m'}$. The most common case has $\tilde{p}_{m'}=0$ when $\theta'$ has a higher dimension, so when jumping to a higher dimension, $\theta=\begin{bmatrix} \theta_{1} \\ \vdots \\ \theta_{p_{m}} \end{bmatrix} \xrightarrow{\text{sample }U} \begin{array}{} \begin{bmatrix} \theta_{1} \\ \vdots \\ \theta_{p_{m}} \end{bmatrix} \\ \begin{bmatrix} U_{1} \\ \vdots \\ U_{p_{m'}-p_{m}} \end{bmatrix} \end{array} \xleftrightarrow[\phi_{m',m}]{~~~\phi_{m,m'}~~} \begin{bmatrix} \theta'_{1} \\ \vdots \\[0.2em] \vdots \\[0.2em] \vdots\\[0.2em] \theta'_{p_{m'}} \end{bmatrix}=\theta',$ and we write $\phi_{m,m'}(\theta,u)=(\theta',\varnothing)$ and $\phi_{m', m}(\theta', \varnothing)=(\theta, u)$ to reflect this fact. Therefore, > [!definition|*] RJMCMC Proposal > The RJMCMC uses an implicit proposal given by $\begin{align*} m' &\sim q_{M}(m' ~|~ m)\\ U &\hphantom{:}\sim q_{U}(u ~|~ m,m', \theta),\\ \theta', U' &:= \phi_{m,m'}(\theta, U), \end{align*}$for some proposal distributions $q_{M},q_{U}$ and involution $\phi_{m,m'}$ of our choice. We use the acceptance probability derived for [[Metropolis Hastings#Proposals with Transformations|M-H proposal with transformations]], which is (without the jump $m\to m'$, just $(\theta,u)\to (\theta',u')$) $\alpha(\theta',u'~|~\theta,u)=\min \left\{ 1, \frac{\pi(\theta')q_{U}(u')}{\pi(\theta)q_{U}(u)}\cdot \left| \det\frac{ \partial (\theta',u') }{ \partial (\theta,u) } \right| \right\}.$ Now incorporating the jump between models, we have > [!definition|*] RJMCMC Acceptance Probability > $\alpha(m',\theta',u'~|~m,\theta,u)=\min \left\{ 1, \frac{\pi(m',\theta')q_{M}(m ~|~ m')q_{U}(u')}{\pi(m,\theta)q_{M}(m' ~|~ m) q_{U}(u)}\cdot \left|\det \frac{ \partial (\theta',u') }{ \partial (\theta,u) } \right| \right\}.$ ### Example: Variable Selection Let $m \subset [p]$ be the indices of features that are included in a regression model, so that they conduct feature selection, and $\theta$ the coefficients of selected features. **Model transition.** We transition between models $m,m'$ by uniformly randomly adding and dropping features one-at-a-time: that is, uniformly sample an index $i\sim \mathrm{Unif}[p]$, then add $i$ to $m$ if it is not already included, and drop it otherwise. So $q_{M}\equiv \frac{1}{p}$ for any $m,m'$ that differ by exactly one feature. - Another possibility is to drop/add feature with probability $\frac{1}{2}$ each, then uniformly sample from $m,[p]\setminus m$ respectively for the feature to drop/add. This gives $q_{M}(m' ~|~ m)= \begin{cases} \frac{1}{2| m |} & \text{if droping feature,} \\ \frac{1}{2(p-| m |)} & \text{if adding feature.} \end{cases}$ **Proposal function.** Let $\phi$ be our proposal function that maps from $(\theta, u)$ to $(\theta', u')$. To ensure $\phi$ is an involution (in particular invertible), we need $\dim (\theta, u)=\dim (\theta',u')$, which we WLOG assume $\dim(u)=1$ (or what ever dimension $\theta'_{j}$ is), and $u' = \varnothing$. When adding a feature $j$, we sample $u\sim q_{U}$, then deterministically compute its coefficient $\theta_{j}'$ as $\theta'_{j}=g(\theta, u)$ with an invertible map $g$. Therefore, we have $\phi(\theta, u)=(\theta \cup g(\theta,u), \varnothing)$ when adding a feature, and $\phi(\theta,\varnothing)=(\theta_{-j},\theta_{j})$ when deleting the $j$th feature, yielding the Jacobian (WLOG letting $\theta'_{j}$ be the last entry of $\theta'$) $\begin{align*} \text{Adding variable: }~J(\theta, u)&= \left| \det\begin{pmatrix} \dfrac{ \partial \theta' }{ \partial \theta }& \dfrac{ \partial \theta' }{ \partial u } \end{pmatrix}\right |=\left|\det \begin{pmatrix} 1 \\ & \ddots \\ & & 1 \\ \leftarrow & \frac{ \partial \theta'_{j} }{ \partial \theta } & \rightarrow & \frac{ \partial \theta'_{j} }{ \partial u } \end{pmatrix}\right |=\left| \frac{ \partial g(\theta, u) }{ \partial u } \right |.\\[1em] \text{Deleting variable: }~J(\theta, \varnothing)&= \left| \det\begin{pmatrix} \dfrac{ \partial \theta' }{ \partial \theta } \\ \\ \dfrac{ \partial u' }{ \partial \theta } \end{pmatrix} \right|= \left|\det \begin{pmatrix} 1 & & & \uparrow \\ & \ddots & & \frac{ \partial \theta'_{j} }{ \partial \theta } \\ & & 1 & \downarrow \\ & & & \frac{ \partial u' }{ \partial \theta_{j} } \end{pmatrix}\right |= \left| \frac{ \partial g(\theta, u) }{ \partial u } \right |^{-1}. \end{align*}$ Then we obtain the proposal density $\begin{align*} \text{Adding variable: }~q(m', \theta', \varnothing ~|~ m, \theta, u)&= \frac{q_{U}(u)}{p} \\ \text{Deleting variable: }~q(m', \theta', u' ~|~ m, \theta, \varnothing)&= \frac{1}{p}, \end{align*}$ or divided by $2| m |$ and $2(p-| m |)$ respectively with the other adding/removing scheme. This gives the acceptance probability of $\begin{align*} \text{Adding: }~\alpha(m', \theta', \varnothing ~|~ m, \theta, u)&= \min\left( 1, \frac{p(\theta', m' ~|~ y)}{p(\theta, m ~|~ y)}\cdot\frac{1}{q_{U}(u)}\cdot | \frac{ \partial g(\theta, u) }{ \partial u } | \right)\\[0.8em] \text{Deleting: }~ \alpha(m', \theta', u' ~|~ m, \theta, \varnothing)&= \min\left( 1, \frac{p(\theta', m' ~|~ y)}{p(\theta, m ~|~ y)}\cdot {q_{U}(u')}\cdot | \frac{ \partial g(\theta, u) }{ \partial u } |^{-1} \right). \end{align*}$ Or with the other scheme, multiply the ratio by another $\frac{p-| m |}{| m' |}$ and $\frac{| m |}{p-| m' |}$ respectively. Lastly, if we have a rather simple proposal when adding a feature $\theta_{j}$, like directly sampling $u=\theta_{j}$ from the prior and adding it to $\theta$, we have $g(\theta, u)=u$, and the Jacobians are $1$.