Suppose we want to sample $X \sim p$, but the pdf $p$ is difficult to simulate; but we have another variable $Y \sim q$ that is simpler to sample.
Instead of manipulating $Y$ to behave like $X$ (like in transformation sampling), rejection sampling *"rejects" values of $Y$ to make the remainder representative of a sample from the target $X \sim p$.*
- $Y \sim q$ is then called the **proposal (distribution)**, while $X \sim p$ is the **target (distribution)**.
- To make the proposal look like the target, we need to rescale its distribution: at $Y=y$, it should be rescaled by $p(y) / q(y)$.
This is done by *randomly accepting the proposal with some probability*$L \propto p(y) / q(y)$ . However, for $L$ to be a valid probability, it needs to be in $[0,1]$. Hence find some $M: p / q \le M$ everywhere, then define the threshold $\text{threshold } L(y):=\frac{p(y)}{q(y)} \frac{1}{M} \in [0,1]$
- $M$ is a constant bound, so rejection sampling needs a proposal $q$ such that $p / q$ is bounded.
- To accept the proposal with this probability, we then simulate $U \sim U[0,1]$ and accept if and only if $U \le L$.
> [!algorithm] Rejection Sampling
> The **rejection sampling** algorithm simulates $X \sim p$ by:
>
> $[1]$ Sample $Y \sim q$, giving $Y=y$.
>
> $[2]$ Sample $U \sim U [0,1]$.
>
> $[3]$ If $U \le p(y) / Mq(y)$, accept and return $X=y$. Otherwise reject and return to $[1]$.
In each iteration, the proposal is accepted with probability $\frac{1}{M}$.
- Hence the number of iterations to simulate one instance follows $\mathrm{Geom}\left( \frac{1}{M} \right)$, with expectation of $M$ iterations per output.
- If $M$ is too large (i.e. $p / q$ is too large at some points), rejection sampling is very inefficient.
> [!proof]- Proof of Acceptance Probability
> Assuming $q \ne 0$ (at least where $p \ne 0$), $\mathbb{P}[\text{accept}]=\int q(y)\mathbb{P}[\text{accept}|Y=y] \, dy=\int q(y)\frac{p(y)}{Mq(y)} \, dy=\frac{1}{M}$where $p(y)$ integrates to $1$.
The rejection algorithm returns $X \sim p$.
> [!proof]
> Let $N$ be the random variable indicating the iteration at which the proposal is accepted. Let $A\subset \mathcal{X}$ be any (measurable) set.
> $\begin{align*}
\mathbb{P}[X\in A, N=n ]&= \mathbb{P}[n-1\text{ rejections}]\cdot \mathbb{P}[X\in A \text{ proposed and accepted}]\\
&= \left( 1-\frac{1}{M} \right)^{n-1}\int _{A}\mathbb{P}[\text{accept} ~|~ X=x \text{ proposed}] ~ dq(x) \\
&= \left( 1-\frac{1}{M} \right)^{n-1} \frac{1}{M}\cdot\left\{\begin{array}{}
\int _{A} \frac{p(x)}{q(x)}\cdot q(x) dx & \text{if continuous} \\
\sum_{x\in A} \frac{p(x)}{q(x)}q(x) & \text{if discrete}
\end{array}\right\}\\[0.8em]
&= p(A) \cdot \left( 1 - \frac{1}{M} \right)^{n-1} \frac{1}{M}.
\end{align*}$
> Therefore, $X$ is independent of $N$, and has distribution $p$. Alternatively, we also get $X \sim p$ by marginalising over $N\sim \mathrm{Geom}(M^{-1})$.
### Rejection Sampling Without Normalisation
*In many cases, we only know that $p,q$ are proportional to functions $\tilde{p}=p \cdot c_{p}$ and $\tilde{q}=q \cdot c_{q}$ but not the normalising constants $c_{p}$ and $c_{q}$*.
To replace $M$, we look for $\tilde{M}: \frac{\tilde{p}}{\tilde{q}}\le \tilde{M} \text{ everywhere}$and accept with probability $\frac{\tilde{p}}{\tilde{M} \tilde{q}}$. This is equivalent to a (normalised) rejection sampling with acceptance probability $\frac{p}{q}\cdot \frac{c_{p}}{\tilde{M}c_{q}}$, i.e. $M:= \tilde{M} c_{q} / c_{p}$.
- As an extremely basic example, let $\tilde{p}(x):= \mathbf{1}\{x\in A\}$ and $\tilde{q}(x):= \mathbf{1}\{x\in B\}$ for some $A \subset B$ (so that $\tilde{p} / \tilde{q}$ is bounded). Then we can choose $\tilde{M}=1$.
- In this case the average success probability for each iteration is not $\tilde{M}$, but $M=c_{p} / \tilde{M}c_{q}$.
- In choosing proposals that maximise the success probability, note that both $\tilde{M}$ and $c_{q}$ depends on the proposal, so it's not enough to just maximize $\tilde{M}$.