Suppose we want to sample $X \sim p$, but the pdf $p$ is difficult to simulate; but we have another variable $Y \sim q$ that is simpler to sample.
Instead of manipulating $Y$ to behave like $X$ (like in transformation sampling), rejection sampling *"rejects" values of $Y$ to make the remainder representative of a sample from the target $X \sim p$.*
- $Y \sim q$ is then called the **proposal (distribution)**, while $X \sim p$ is the **target (distribution)**.
- To make the proposal look like the target, we need to rescale its distribution: at $Y=y$, it should be rescaled by $p(y) / q(y)$.
This is done by *randomly accepting the proposal with some probability*$L \propto p(y) / q(y)$ . However, for $L$ to be a valid probability, it needs to be in $[0,1]$. Hence find some $M: p / q \le M$ everywhere, then define the threshold $\text{threshold } L(y):=\frac{p(y)}{q(y)} \frac{1}{M} \in [0,1]$
- $M$ is a constant bound, so rejection sampling needs a proposal $q$ such that $p / q$ is bounded.
- To accept the proposal with this probability, we then simulate $U \sim U[0,1]$ and accept if and only if $U \le L$.
> [!algorithm] Rejection Sampling
> The **rejection sampling** algorithm simulates $X \sim p$ by:
>
> $[1]$ Sample $Y \sim q$, giving $Y=y$.
>
> $[2]$ Sample $U \sim U [0,1]$.
>
> $[3]$ If $U \le p(y) / Mq(y)$, accept and return $X=y$. Otherwise reject and return to $[1]$.
In each iteration, the proposal is accepted with probability $\frac{1}{M}$.
- Hence the number of iterations to simulate one instance follows $\mathrm{Geom}\left( \frac{1}{M} \right)$, with expectation of $M$ iterations per output.
- If $M$ is too large (i.e. $p / q$ is too large at some points), rejection sampling is very inefficient.
> [!proof]- Proof of Acceptance Probability
> Assuming $q \ne 0$ (at least where $p \ne 0$), $\mathbb{P}[\text{accept}]=\int q(y)\mathbb{P}[\text{accept}|Y=y] \, dy=\int q(y)\frac{p(y)}{Mq(y)} \, dy=\frac{1}{M}$where $p(y)$ integrates to $1$.
The rejection algorithm returns $X \sim p$.
> [!proof]- Proof for Discrete Variables
> At each iteration, $Y=x$ is accepted with probability $\begin{align*}
\mathbb{P}[Y=x,\,\text{accepted}]&= \mathbb{P}[Y=x]\cdot\mathbb{P}[\text{accepted}\,|\,Y=x]\\
&= q(x)\cdot \mathbb{P}\left[ U < \frac{p(x)}{Mq(x)} \right]\\
&= q(x) \cdot \frac{p(x)}{Mq(x)}\\
&= p(x) / M
\end{align*}$Then summing over the iterations on which $Y=x$ is accepted, $\begin{align*}
\mathbb{P}[X=x]&= \sum_{i}\mathbb{P}[Y=x,\, \text{accepted}]\cdot \mathbb{P}[\text{rejected $i-1$ times}]\\
&= \sum_{i} \frac{p(x)}{M} \cdot \left( 1-\frac{1}{M} \right)^{i-1}\\
&= p(x)
\end{align*}$Hence $X \sim p$.
> [!Proof]- Proof for Continuous Variables
> Let $P,Q$ be the respective cdf of the target and proposal.
> At each iteration, the cdf of $X$ is $\begin{align*}\mathbb{P}[Y \le x,\text{accepted}]
&= \mathbb{P}[Y \le x]\cdot\mathbb{P}[\text{accepted}\,|\,Y \le x]\\
&= Q(x)\cdot \mathbb{P}\left[ U < \frac{p(Y)}{Mq(Y)}\,|\,Y \le x \right]\\
&= Q(x) \cdot \int_{-\infty}^{x} \frac{p(y)}{Mq(y)} \underbrace{\frac{q(y)}{Q(x)}}_{q_{Y|Y \le x}} \, dy \\
&= P(x) / M
\end{align*}$where $q(y) / Q(x)$ is the distribution of $Y$, conditioning on $Y \le x$. Again, summing over the iterations on which $Y=x$ is accepted, $F_{X}(x)=\sum_{i}\mathbb{P}[Y \le x,\text{accepted}]\cdot \mathbb{P}[\text{rejected $i-1$ times}]=P(x)$Hence $X \sim p,P$.
### Rejection Sampling Without Normalization
*In many cases, we only know that $p,q$ are proportional to functions $\tilde{p}=p \cdot c_{p}$ and $\tilde{q}=q \cdot c_{q}$ but not the normalizing constants $c_{p}$ and $c_{q}$*.
To replace $M$, we compute $\tilde{M}: \frac{\tilde{p}}{\tilde{q}}\le \tilde{M} \text{ everywhere}$and accept with probability $\frac{\tilde{p}}{\tilde{M} \tilde{q}}$.
- In this case the average success probability for each iteration is not $\tilde{M}$, but $M=c_{p} / \tilde{M}c_{q}$.
- In choosing proposals that maximize the success probablity, note that both $\tilde{M}$ and $c_{q}$ depends on the proposal, so it's not enough to just maximize $\tilde{M}$.