> [!tldr] ABC Algorithm
> The ABC (approximate Bayesian computation) algorithm simulates $\theta \approx \pi(\theta ~|~ y)$ without requiring knowledge of the likelihood $p(y~|~\theta)$.
>
> It only requires ability to sample from $\theta \sim \pi(\theta)$ and $Y \sim p(y ~|~ \theta)$ (not computing the density themselves).
Suppose our data $y$ comes from a very cursed process, so we cannot compute $p(y ~|~ \theta)$ explicitly and can only simulate it.
Or we can't normalize this likelihood (e.g. imagine a lottery where we know the number of tickets each $y$ has given $\theta$, but not the total amount), so we have $p(y~|~\theta)=f(y,\theta) / c(\theta)$ for some unknown $c(\theta)$.
This requires a so-called **likelihood-free** approach, and ABC is a specific case of this class of methods.
### Motivation
Consider a [[Rejection Sampling]] algorithm that uses $\pi(\theta)$ as the proposal, for discrete $y \in \mathcal{Y}$:
> [!algorithm] Not-so-approximate BC
> Input $y_{\mathrm{obs}}$ for which we want to sample $\theta \sim \pi(\theta ~|~ y_{\mathrm{obs}})$.
> Repeat:
> - Propose $\theta \sim \pi(\theta)$ and for that value of $\theta$, draw $y \sim p(y ~|~ \theta)$.
> - If the drawn $y = y_{\mathrm{obs}}$, then accept the proposal $\theta$ and return.
> - Else, continue the loop.
- Note that accepting a given proposal $\theta$ happens with probability $\mathbb{P}(\text{draw }y_{\mathrm{obs}} ~|~ \theta)= \frac{\mathbb{P}[\text{draw }y_{\mathrm{\mathrm{obs}}} \text{ and }\theta]}{\mathbb{P}[\text{draw }\theta]} =\left( \underset{\text{target}}{p(\theta ~|~ y_{\mathrm{obs}})}\Big/ \underset{\text{proposal}}{\pi(\theta)} \right) \cdot \underset{\substack{\\ \text{acceptance} \\ \text{prob.}}}{\mathbb{P}(\text{draw }y_{\mathrm{obs}})},$so this is indeed a rejection sampling algorithm, and returns $\theta \sim p(\theta ~|~ y_{\mathrm{obs}})$ as desired.
- The only issue is that $\mathcal{Y}$ must be discrete (up to $p$-null sets), else the algorithm will run forever almost surely.
> [!idea] To handle continuous data, we can simply discretize it by settling with acceptance when the sampled $y$ is "close enough" to $y_{\mathrm{obs}}$.
But how do we define "close"?
- Since $\mathcal{Y}$ can be very high dimensional (e.g. $\mathbb{R}^{n}$ for a large $n$), Euclidean distance $\| y-y_{\mathrm{obs}} \|$ is bad because of [[Curse of Dimensionality]], i.e. it's almost impossible to sample another $y$ that is in $B(y_{\mathrm{obs}},r;d_{2}):= \{ y \in \mathcal{Y} ~|~ d_{2}(y,y_{\mathrm{obs}}) < r\}$.
- However, we don't really need entry-wise closeness: if $\theta = \mathbb{E}[Y]$, we should treat $y=(1,2)$ and $y=(2,1)$ the same, even though their $l_{2}$ distance is large.
Therefore, we can use summary statistics to filter for information that actually matters for inferring $\theta$.
- This reduces the dimension of the dataset from some large $n$ to that of the summary statistic (e.g. $2$, if the statistic is the sample mean and standard deviation $(\bar{y},s_{y})$).
- When a summary statistic $S$ is [[Minimality and Sufficiency#Sufficiency|sufficient]], then $\theta ~|~\{S(Y)=S(y_{\mathrm{obs}}) \}$ is identically distributed as $\theta ~|~ \{ Y=y_{\mathrm{obs}} \}$, so no information is lost in the process.
In conclusion, we make two improvements to handle continuous data:
- First define "closeness" with a summary statistic $S$.
- Second, instead of requiring exact equality, we settle for $S(Y) \approx S(y_{\mathrm{obs}})$.
> [!exposition] Formally defining "closeness"
>
> Formally, take a summary statistic $S: \mathcal{Y} \to \mathbb{R}^{s}$ (for a small $s$) and metric $d$ on $\mathbb{R}^{s}$, we can define the "distance" between $y,y'$ as
> $d_{S}(y,y'):= d(S(y), S(y')),$
>
> i.e. the difference between their summary statistics.
> - For example, with $S: y \mapsto \bar{y}$ and $d=d_{2}$, $d_{S}$ simply measures the difference between the sample means.
>
> With this, we can define "close enough" (with radius $\delta$) to be $B(y_{\mathrm{obs}}, \delta;d_{S}):= \{ y \in \mathcal{Y} ~|~ d_{S}(y,y_{\mathrm{obs}}) < \delta \},$i.e. the set of $y$ with a summary statistic similar to that of $y_{\mathrm{obs}}$. When $y_{\mathrm{obs}}$ and $S$ is fixed, we simply denote that as $B(\delta)$.
> [!algorithm] ABC
> Input $y_{\mathrm{obs}}$ for which we want to sample $\theta \sim \pi(\theta ~|~ y_{\mathrm{obs}})$. Fix the radius $\delta$ and summary statistics $S$ for defining "closeness". Define the ball $B:= B(y_{\mathrm{obs}},\delta; d_{S})$.
>
> Repeat:
> - Propose $\theta \sim \pi(\theta)$ and for that value of $\theta$, draw $y \sim p(y ~|~ \theta)$.
> - If the drawn $y \in B(\delta)$, accept the proposal $\theta$ and end the program.
> - Else, continue the loop.
This is equivalent to rejection sampling with the target $\pi(\theta ~|~ y \in B)$ (rather than $\pi(\theta ~|~ y=y_{\mathrm{obs}})$): $\mathbb{P}[\theta \text{ is accepted} ~|~ \theta \text{ is proposed}]= p(y\in B ~|~ \theta)=\frac{p(\theta ~|~ y\in B)}{\pi(\theta)}\cdot p(B(\delta)).$
Therefore, we define this target as
> [!theorem|*] ABC Posterior
> The ABC algorithm samples from the **ABC posterior** $\pi_{\mathrm{ABC}}(\theta)=\pi(\theta ~|~ Y \in B(\delta)).$
Alternatively, we can interpret its target as a blurred version of the posterior:
$\begin{align*}
\pi(\theta ~|~ y \in B)&= \frac{p(B,\theta)}{p(B)}\\
&= \int _{B} \frac{p(y,\theta)}{p(B)} ~ dy \\
&= \int _{B} \frac{p(y)}{p(B)}\cdot \textcolor{cyan}{\pi(\theta ~|~ y)} ~ dy \\
&= \int _{B} p(y ~|~ B) \cdot\textcolor{cyan}{\pi(\theta ~|~ y)} ~ dy\\
&= \mathbb{E}_{y}[\textcolor{cyan}{\pi(\theta ~|~ y)} ~|~ y \in B],
\end{align*} $
i.e. the posterior $\textcolor{cyan}{\pi(\theta ~|~ y)}$ blurred over $B$, weighed by $p(y ~|~ y\in B)$.
> [!idea] The ABC posterior is a blurred version of the exact posterior.
### Regression Adjustment
There is a simple technique for improving the approximation $\pi(\theta ~|~ B) \approx \pi(\theta ~|~ y_{\mathrm{obs}})$. Let $\theta \sim \pi(\theta ~|~y)$ (for some $y \in B$). We hope to slightly transform $\theta$ to have the same distribution as $\theta' \sim \pi(\theta ~|~ y_{\mathrm{obs}})$.
> [!theorem|*] Regression Adjustment
> Suppose $\mathbb{E}[\theta ~|~ y]=\mu(S(y))$ is a function of $S$ (e.g. when $S$ is sufficient), and is smooth enough that for small $\delta$, it is approximately linear: $\mu(s) \approx \mu(s_{\mathrm{obs}})+(s-s_{\mathrm{obs}})^{T}\beta$for some coefficients $\beta$, and $s,s_{\mathrm{obs}}:= S(y),S(y_{\mathrm{obs}})$.
>
> Furthermore, suppose the posteriors have $\mu(s)$ as a location parameter, i.e. $\theta-\mu(s) \overset{\mathrm{identical ly}}{\sim}\theta'-\mu(s_{\mathrm{obs}}),$
>
> Then solving for $\theta'$, $\theta-(s-s_{\mathrm{obs}})^{T}\beta \approx \theta-\mu(s)+\mu(s_{\mathrm{obs}}) = \theta' \sim \pi(\cdot ~|~ y_{\mathrm{obs}}).$
Therefore, we can get a (better) approximate sample of $\theta' \sim\pi(\theta ~|~ y_{\mathrm{obs}})$ by using $\theta-(s-s_{\mathrm{obs}})^{T}{\beta}$ instead of $\theta$. The $(s-s_{\mathrm{obs}})^{T}\beta$ term is the **regression adjustment**, and since $\beta$ is the sensitivity of $\mathbb{E}[\theta ~|~ S]$ wrt. $S$, it can be approximated with a linear regression $(\theta_{1},\dots) \sim (S(y_{1}),\dots),$i.e. the dataset $\theta_{i} \sim \pi_{\mathrm{ABC}}$ and $y_{i}\sim p(y~|~\theta_{i})$ from the ABC algorithm. With the regression coefficient $\hat{\beta}$, we then return ${\theta}'_{i}:= \theta_{i}-(S(y_{i})-S(y_{\mathrm{obs}}))^{T}\hat{\beta}$ as the regression-adjusted samples.