As seen in the [[Bayesian Inference|main note about Bayesian inference]], we find posteriors by updating the prior:
![[Bayesian Inference#^8a333e]]
There are pairs of prior families and likelihoods that do exactly this.
> [!definition|*] Conjugate pairs, Hyperparameters
> **Conjugate pairs of likelihoods and priors** generate posteriors that are in the same family as the priors.
>
> More precisely, given the likelihood is $f(\mathbf{x},\ast)$, then a family of priors $\Pi:=\{ \pi(\theta;\gamma)\,|\, \gamma \in \Gamma \}$ is **conjugate** (with the likelihood) if for any prior $\pi \in \Pi$, the posterior is still in $\Pi$.
>
> In this case $\gamma$ is called the **hyperparameter**.
Common examples of conjugates include:
$\begin{array}{c|c}
\text{Prior} & \text{Likelihood} \\
\hline
\text{Normal} & \text{Normal} \\
\text{Beta} & \text{Binomial, Geometric} \\
\text{Gamma} & \text{Poisson, Exponential}
\end{array}$
This is convenient because the parameters of the posterior can then be identified from $f(\mathbf{x}|\theta)\times\pi(\theta)$.
- For example, in studying the number of heads in coin flipping $\sim\mathrm{Binom}(n,p)$ with $p=\mathbb{P}[\text{head}]$ unknown, choosing a $\text{Beta}(\alpha,\beta)$ prior makes algebra easier.
### Examples: Gamma and Beta Distributions
> [!exposition] Conjugating Gamma Prior with Poisson Likelihood
> The [[Gamma Distribution|gamma]] prior $\lambda \sim \Gamma(\alpha, \beta)$ conjugates with a Poisson likelihood $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim}\mathrm{Po}(\lambda)$: $\begin{align*}
\underbrace{\vphantom{\Bigg \uparrow}\frac{\beta^{\alpha}}{\Gamma(\alpha)}\lambda^{\alpha-1}e^{-\beta \lambda}}_{{\text{gamma prior}}}
\,\cdot \,
&\underbrace{\vphantom{\Bigg \uparrow}\prod^{n}_{k=1}\left(\frac{\lambda^{x_{k}}}{x_{k}!}e^{-\lambda}\right)}_{{\text{Poisson likelihood}}}\\ \\
\propto&\,\,
{\lambda^{\alpha+\sum_{k}x_{k}-1}e^{-(\beta+n)\lambda}}\\ \\
\Longrightarrow& \,\,\underbrace{\Gamma\left( \alpha+\sum_{k}x_{k},\beta+n \right)}_{\text{gamma posterior}}
\end{align*}$
> [!exposition] Conjugating Beta Prior with Binomial Likelihood
> The [[Beta Distribution|beta]] prior $p \sim B(\alpha,\beta)$ conjugates with the binomial likelihood $X\sim \mathrm{Binom}(n,p)$: $\begin{align*}
\underbrace{\vphantom{\Bigg \uparrow}\frac{1}{B(\alpha,\beta)}p^{\alpha-1}(1-p)^{\beta-1}}_{\text{beta prior}}
&\cdot
\underbrace{\vphantom{\Bigg \uparrow}{n\choose x}p^{x}(1-p)^{n-x} }_{{\text{binomial likelihood}}}\\ \\
\propto
&\,\,p^{(\alpha+x)-1}(1-p)^{(\beta+n-x)-1}\\ \\
\Longrightarrow&\,\,
\underbrace{B(\alpha+x, \beta+n-x)}_{\text{beta posterior}}
\end{align*}$
### Conjugation in Exponential Families
> [!bigidea]
> Exponential family conjugate priors can be interpreted as previously conducted experiments, to be combined with new evidence.
[[Exponential Families|Exponential families]] give a convenient framework of conjugate paira of priors and likelihoods:
> [!theorem|*] Conjugate pairs in exponential families
> If the sample's likelihood $f(\cdot\,;\theta)$ is from an exponential family with densities $\Big\{ f(x;\theta)=\exp[\mathbf{T}(x)\cdot \pmb{\eta}(\theta)-B(\theta)]h(x) \,|\, \theta \in \Theta \Big\},$then distributions of the form $\Big\{ \pi(\theta;\gamma,n_{0})\propto\exp[\pmb{\gamma} \cdot \pmb{\eta}(\theta)-n_{0}B(\theta)] \Big\}$are conjugates of those likelihoods. Here $n_{0},\pmb{\gamma}$ are both **hyperparameters**.
- Of course restrictions on $n_{0}, \pmb{\gamma}$ might be necessary for $\pi$ to be normalizable (improper priors are still useable, though).
- All of the distributions listed in the example table above are exponential families.
After observing $n$ iid. observations $\mathbf{x}$ with canonical statistic $\mathbf{T}(\mathbf{x})$, the posterior is $\pi(\theta \,|\,\mathbf{x}, \pmb{\gamma}, n_{0})\propto\exp[(n \bar{\mathbf{T}}+\pmb{\gamma})\cdot \eta(\theta)-(n+n_{0})B(\theta)],$so defining $\bar{\mathbf{T}}_{0}:= \gamma / n_{0}$, *the posterior is a weighted average of the prior and the likelihood*:
$\pi(\theta \,|\,x, \gamma, \gamma_{0})\propto\exp[(n \bar{\mathbf{T}}+n_{0} \bar{\mathbf{T}}_{0})\cdot \eta(\theta)-(n+n_{0})B(\theta)],$
with canonical statistic $\bar{\mathbf{T}}_{\mathrm{post}}= \frac{1}{n + n_{0}}(n \bar{\mathbf{T}} + n_{0} \bar{\mathbf{T}}_{0})$. In particular, we can *treat $\mathbf{T}_{0}$ and $n_{0}$ are the result of previous draws from the same distribution*.
> [!examples]
> A gamma prior $\lambda \sim\Gamma(\alpha,\beta)$ conjugating with $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim} \mathrm{Po}(\lambda)$ can be interpreted as a Poisson process counting the number of distinct events: $
\begin{align*}
&\text{Prior } \lambda \sim\Gamma(\alpha,\beta): && \alpha \text{ occurances of } \beta \text{ distinct events}\\
&\text{Likelihood of } X_{1,\dots,n}: &&\sum_{k}x_{k} \text{ occurances of }n \text{ other events}
\end{align*}$and the posterior $\Gamma\left( \alpha+\sum_{k}x_{k}, \beta+n\right)$ is just the "sum" of the two.
>
> Similarly, a beta prior $\lambda \sim B(\alpha,\beta)$ conjugating with $X \sim \mathrm{Binomial}(n,\lambda)$ can be interpreted as previous Binomial/Bernoulli trials: $\begin{align*}
&\text{Prior } \lambda \sim B(\alpha,\beta): && \alpha-1 \text{ successes and }\beta-1 \text{ failures seen}\\
&\text{Likelihood of } X: &&X=x \text{ successes in }n \text{ more trials}
\end{align*}$so the posterior $B(\alpha+x, \beta+n-x)$ records $+x$ successes and $+(n-x)$ failures.