Consider a collection of models $\mathcal{M}$, where each model $m \in \mathcal{M}$ has its own set of parameters $\theta \sim \pi(\theta ~|~ m)$, but the dimensions of $\theta ~|~ m$ are in general different, denoted $p_{m}$.
- For example, a regression setting $Y \sim X^{T}\theta$ where $\theta$ performs variable selection, so $p_{m}$ is the number of features selected by $m$.
We wish to use MCMC to sample from the distribution $\pi(\theta,m)$, so our samples are $X^{(t)}=(\theta^{(t)}, m^{(t)})$. This makes traditional MCMC (like vanilla [[Metropolis Hastings]]) difficult, since they assume the sample space $\Omega$ to have a constant dimension for reversibility, but the coefficients $\theta^{(t)}$ has dimension $p_{m^{(t)}}$, which changes over time:
$\theta^{(t)}=\begin{bmatrix}
\theta^{(t)}_{1} \\ \vdots \\ \theta^{(t)}_{p_{m^{(t)}}}
\end{bmatrix} \xrightarrow{???} \begin{bmatrix}
\theta^{(t+1)}_{1} \\ \vdots \\ \vdots \\ \theta^{(t+1)}_{p_{m^{(t+1)}}}
\end{bmatrix}= \theta^{(t+1)} $
To solve the issue, to jump from $(\theta ,m)$ to $(\theta',m')$, RJMCMC pads $\theta$ and $\theta'$ with auxiliary variables $U,U'$ so that their dimensions are comparable, then makes the jump with **proposal function** $\phi_{m,m'}$:
$\theta', u':=\phi_{m,m'}(\theta,u),$
and we require the family $\{ \phi_{m,m'}\}$ to be **involutions**, in the sense that $\phi_{m',m}(\theta',u')=\theta,u$ holds at the same time, i.e. mapping twice will recover the original model's parameters. From now on we drop the subscript and write $\phi$.
- This involution can be defined with choosing the map's first component $\theta':=\phi_{1}(\theta,u)$, then requiring $u'$ to be the solution to $\theta=\phi_{1}(\theta', u')$. As long as our choice of $\phi_{1}$ is "invertible" in this sense, the defines the relationship $\phi_{2}(\theta,u):= u'$, and we can write $\phi:= (\phi_{1},\phi_{2})$.
- For example, if $\phi_{1}(\theta,u)=\theta+u$, then $u'=\phi_{2}(\theta,u)=-u$.
So given a proposed jump $m\to m'$, we have
$\theta=\begin{bmatrix}
\theta_{1} \\ \vdots \\ \theta_{p_{m}}
\end{bmatrix} \xrightarrow{\text{sample }U} \begin{array}{}
\begin{bmatrix} \theta_{1} \\ \vdots \\ \theta_{p_{m}} \end{bmatrix} \\ \begin{bmatrix}
U_{1} \\ \vdots \\ \vdots \\ U_{\tilde{p}_{m}}
\end{bmatrix}
\end{array} \xleftrightarrow[\phi_{m',m}]{~~~\phi_{m,m'}~~}\begin{array}{}
\begin{bmatrix} \theta'_{1} \\ \vdots \\ \vdots \\ \theta'_{p_{m'}} \end{bmatrix} \\ \begin{bmatrix}
U'_{1} \\ \vdots \\ U'_{\tilde{p}_{m}}
\end{bmatrix}
\end{array} \xrightarrow{\text{drop }U'}\begin{bmatrix} \theta'_{1} \\ \vdots \\ \vdots \\ \theta'_{p_{m'}} \end{bmatrix}=\theta',$
where $U,U'$ have dimensions $\tilde{p}_{m},\tilde{p}_{m'}$ respectively, so that the padded dimensions are equal on both sides, with $p_{m}+\tilde{p}_{m}=p_{m'}+\tilde{p}_{m'}$.
Therefore,
> [!definition|*] RJMCMC Proposal
> The RJMCMC uses an implicit proposal given by $\begin{align*}
m' &\sim q_{M}(m' ~|~ m)\\
U &\hphantom{:}\sim q_{U}(u ~|~ m,m', \theta),\\
\theta', U' &:= \phi_{m,m'}(\theta, U),
\end{align*}$for some proposal distributions $q_{M},q_{U}$ and involution $\phi_{m,m'}$ of our choice.
We the acceptance probability derived for [[Metropolis Hastings#Proposals with Transformations|M-H proposal with transformations]], which is (without the jump $m\to m'$, just $(\theta,u)\to (\theta',u')$)
$\alpha(\theta',u'~|~\theta,u)=\min \left\{ 1, \frac{\pi(\theta')q_{U}(u')}{\pi(\theta)q_{U}(u)}\cdot \left| \det\frac{ \partial (\theta',u') }{ \partial (\theta,u) } \right| \right\}.$
Now incorporating the jump between models, we have
> [!definition|*] RJMCMC Acceptance Probability
> $\alpha(m',\theta',u'~|~m,\theta,u)=\min \left\{ 1, \frac{\pi(m',\theta')q_{M}(m ~|~ m')q_{U}(u')}{\pi(m,\theta)q_{M}(m' ~|~ m) q_{U}(u)}\cdot \left|\det \frac{ \partial (\theta',u') }{ \partial (\theta,u) } \right| \right\}.$
### Example: Variable Selection
Let $m$ are indices for feature selection, and $\theta$ contains the coefficients of those features included in $m$. For simplicity, let our proposal be:
- Transition between models $m,m'$ by uniformly randomly adding and dropping features one-at-a-time, so out of $p$ features,
$q_{M}(m ~|~ m')= \begin{cases} \frac{1}{2| m |} & \text{if droping feature} \\
\frac{1}{2(p-| m |)} & \text{if adding feature}
\end{cases}$
- When adding a feature $j$, its coefficient $\theta_{j}'$ is proposed by $\theta'_{j}=g(\theta, u)$, so that the proposal function $\phi$ has $\phi_{1}(\theta, u)= \theta \cup g(\theta,u).$
- To ensure $\phi$ is an involution (in particular invertible), we need $\dim (\theta, u)=\dim (\theta',u')$, which we WLOG assume $\dim(u)=1$ (or what ever dimension $\theta'_{j}$ is), and $u' = \varnothing$.
- Therefore, we have $\phi(\theta, u)=(\theta', \varnothing)$, yielding the Jacobian (WLOG letting $\theta'_{j}$ be the last entry of $\theta'$) $\begin{align*}
\text{Adding variable: }~J(\theta, u)&= \left| \det\begin{pmatrix}
\dfrac{ \partial \theta' }{ \partial \theta }& \dfrac{ \partial \theta' }{ \partial u }
\end{pmatrix}\right |=\left|\det \begin{pmatrix}
1 \\ & \ddots \\ & & 1 \\ \leftarrow & \frac{ \partial \theta'_{j} }{ \partial \theta } & \rightarrow & \frac{ \partial \theta'_{j} }{ \partial u }
\end{pmatrix}\right |=\left| \frac{ \partial g(\theta, u) }{ \partial u } \right |.\\
\text{Deleting variable: }~J(\theta, \varnothing)&= \left| \begin{pmatrix}
\dfrac{ \partial \theta' }{ \partial \theta } \\ \\ \dfrac{ \partial u' }{ \partial \theta }
\end{pmatrix} \right|= \left|\det \begin{pmatrix}
1 & & & \uparrow \\ & \ddots & & \frac{ \partial \theta'_{j} }{ \partial \theta } \\ & & 1 & \downarrow \\ & & & \frac{ \partial u' }{ \partial \theta_{j} }
\end{pmatrix}\right |= \left| \frac{ \partial g(\theta, u) }{ \partial u } \right |^{-1}.
\end{align*}$
Then we obtain the proposal kernel $\begin{align*}
\text{Adding variable: }~q(m', \theta', \varnothing ~|~ m, \theta, u)&= \frac{1}{2(p-| m |)}q_{U}(du) \\
\text{Deleting variable: }~q(m', \theta', u' ~|~ m, \theta, \varnothing)&= \frac{1}{2| m |}
\end{align*}$
acceptance probability of
$\begin{align*}
\text{Adding: }~\alpha(m', \theta', \varnothing ~|~ m, \theta, u)&= \min\left( 1, \frac{p(\theta', m' ~|~ y)}{p(\theta, m ~|~ y)}\cdot \frac{p-| m |}{| m'|}\cdot\frac{1}{q_{U}(u)}\cdot | \frac{ \partial g(\theta, u) }{ \partial u } | \right)\\[0.8em]
\text{Deleting: }~ \alpha(m', \theta', u' ~|~ m, \theta, \varnothing)&= \min\left( 1, \frac{p(\theta', m' ~|~ y)}{p(\theta, m ~|~ y)}\cdot \frac{| m |}{p-| m' |}\cdot {q_{U}(u')}\cdot | \frac{ \partial g(\theta, u) }{ \partial u } |^{-1} \right)
\end{align*}$