Laplace Approximation - Random Notes Go Brrrrrrr

> [!tldr] Laplace Approximation > Given a distribution $f(\theta)$, its Laplace approximation $\hat{f}(\theta)$ is a Gaussian density centered at its mode $\hat{\theta}:= \underset{\theta}{\arg\max}~f(\theta)$: more precisely, $\log(\hat{f}(\theta)) := \mathrm{const.}+ \frac{1}{2}(\theta-\hat{\theta})^{T}\left( \nabla_{\!\theta}^{2}f|_{\theta=\hat{\theta}} \right)(\theta-\hat{\theta}),$i.e. the density of $N\left( \hat{\theta}, -\left( \nabla_{\!{\theta}}^{2}f|_{\theta=\hat{\theta}} \right)^{-1} \right)$. > - The precision matrix $-\nabla^{2}f$ equals the [[Information and Bounding Errors#^7df282|observed information]] at the mode $\hat{\theta}$. #### Usage It is often used when a Gaussian density can greatly simplify computations, e.g. when computing marginals $p(y_{\text{new}} ~|~ \mathbf{y})=\int p(y_{\text{new}} ~|~ \theta) \cdot p(\theta ~|~ \mathbf{y}) ~ d\theta$ in Bayesian inference, where it might be hard to integrate over a non-Gaussian $p(\theta ~|~ \mathbf{y})$). We can replace $p(\theta ~|~ \mathbf{y})$ with its Laplace approximation $\hat{p}(\theta ~|~ \mathbf{y})$. #### Derivation It is derived with a second-order Taylor approximation of $\log f(\theta)$: $\log f(\theta) \approx \log f(\hat{\theta})+(\theta-\hat{\theta})^{T}\cancel{\nabla_{\!\theta}\log f|_{\theta=\hat{\theta}}}+\frac{1}{2}(\theta-\hat{\theta})^{T}\nabla_{\!\theta}^{2}f|_{\theta=\hat{\theta}}(\theta-\hat{\theta}),$where the first derivative is $0$ due to $\hat{\theta}$ being the mode.