Conjugate Priors - Random Notes Go Brrrrrrr

As seen in the [[Bayesian Inference|main note about Bayesian inference]], we find posteriors by updating the prior: ![[Bayesian Inference#^8a333e]] There are pairs of prior families and likelihoods that do exactly this. > [!definition|*] Conjugate pairs, Hyperparameters > **Conjugate pairs of likelihoods and priors** generate posteriors that are in the same family as the priors. > > More precisely, given the likelihood is $f(\mathbf{x},\ast)$, then a family of priors $\Pi:=\{ \pi(\theta;\gamma)\,|\, \gamma \in \Gamma \}$ is **conjugate** (with the likelihood) if for any prior $\pi \in \Pi$, the posterior is still in $\Pi$. > > In this case $\gamma$ is called the **hyperparameter**. Common examples of conjugates include: $\begin{array}{c|c} \text{Prior} & \text{Likelihood} \\ \hline \text{Normal} & \text{Normal} \\ \text{Beta} & \text{Binomial, Geometric} \\ \text{Gamma} & \text{Poisson, Exponential} \end{array}$ This is convenient because the parameters of the posterior can then be identified from $f(\mathbf{x}|\theta)\times\pi(\theta)$. - For example, in studying the number of heads in coin flipping $\sim\mathrm{Binom}(n,p)$ with $p=\mathbb{P}[\text{head}]$ unknown, choosing a $\text{Beta}(\alpha,\beta)$ prior makes algebra easier. ### Examples: Gamma and Beta Distributions > [!exposition] Conjugating Gamma Prior with Poisson Likelihood > The [[Gamma Distribution|gamma]] prior $\lambda \sim \Gamma(\alpha, \beta)$ conjugates with a Poisson likelihood $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim}\mathrm{Po}(\lambda)$: $\begin{align*} \underbrace{\vphantom{\Bigg \uparrow}\frac{\beta^{\alpha}}{\Gamma(\alpha)}\lambda^{\alpha-1}e^{-\beta \lambda}}_{{\text{gamma prior}}} \,\cdot \, &\underbrace{\vphantom{\Bigg \uparrow}\prod^{n}_{k=1}\left(\frac{\lambda^{x_{k}}}{x_{k}!}e^{-\lambda}\right)}_{{\text{Poisson likelihood}}}\\ \\ \propto&\,\, {\lambda^{\alpha+\sum_{k}x_{k}-1}e^{-(\beta+n)\lambda}}\\ \\ \Longrightarrow& \,\,\underbrace{\Gamma\left( \alpha+\sum_{k}x_{k},\beta+n \right)}_{\text{gamma posterior}} \end{align*}$ > [!exposition] Conjugating Beta Prior with Binomial Likelihood > The [[Beta Distribution|beta]] prior $p \sim B(\alpha,\beta)$ conjugates with the binomial likelihood $X\sim \mathrm{Binom}(n,p)$: $\begin{align*} \underbrace{\vphantom{\Bigg \uparrow}\frac{1}{B(\alpha,\beta)}p^{\alpha-1}(1-p)^{\beta-1}}_{\text{beta prior}} &\cdot \underbrace{\vphantom{\Bigg \uparrow}{n\choose x}p^{x}(1-p)^{n-x} }_{{\text{binomial likelihood}}}\\ \\ \propto &\,\,p^{(\alpha+x)-1}(1-p)^{(\beta+n-x)-1}\\ \\ \Longrightarrow&\,\, \underbrace{B(\alpha+x, \beta+n-x)}_{\text{beta posterior}} \end{align*}$ ### Conjugation in Exponential Families > [!bigidea] > Exponential family conjugate priors can be interpreted as previously conducted experiments, to be combined with new evidence. [[Exponential Families|Exponential families]] give a convenient framework of conjugate paira of priors and likelihoods: > [!theorem|*] Conjugate pairs in exponential families > If the sample's likelihood $f(\cdot\,;\theta)$ is from an exponential family with densities $\Big\{ f(x;\theta)=\exp[\mathbf{T}(x)\cdot \pmb{\eta}(\theta)-B(\theta)]h(x) \,|\, \theta \in \Theta \Big\},$then distributions of the form $\Big\{ \pi(\theta;\gamma,n_{0})\propto\exp[\pmb{\gamma} \cdot \pmb{\eta}(\theta)-n_{0}B(\theta)] \Big\}$are conjugates of those likelihoods. Here $n_{0},\pmb{\gamma}$ are both **hyperparameters**. - Of course restrictions on $n_{0}, \pmb{\gamma}$ might be necessary for $\pi$ to be normalizable (improper priors are still useable, though). - All of the distributions listed in the example table above are exponential families. After observing $n$ iid. observations $\mathbf{x}$ with canonical statistic $\mathbf{T}(\mathbf{x})$, the posterior is $\pi(\theta \,|\,\mathbf{x}, \pmb{\gamma}, n_{0})\propto\exp[(n \bar{\mathbf{T}}+\pmb{\gamma})\cdot \eta(\theta)-(n+n_{0})B(\theta)],$so defining $\bar{\mathbf{T}}_{0}:= \gamma / n_{0}$, *the posterior is a weighted average of the prior and the likelihood*: $\pi(\theta \,|\,x, \gamma, \gamma_{0})\propto\exp[(n \bar{\mathbf{T}}+n_{0} \bar{\mathbf{T}}_{0})\cdot \eta(\theta)-(n+n_{0})B(\theta)],$ with canonical statistic $\bar{\mathbf{T}}_{\mathrm{post}}= \frac{1}{n + n_{0}}(n \bar{\mathbf{T}} + n_{0} \bar{\mathbf{T}}_{0})$. In particular, we can *treat $\mathbf{T}_{0}$ and $n_{0}$ are the result of previous draws from the same distribution*. > [!examples] > A gamma prior $\lambda \sim\Gamma(\alpha,\beta)$ conjugating with $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim} \mathrm{Po}(\lambda)$ can be interpreted as a Poisson process counting the number of distinct events: $ \begin{align*} &\text{Prior } \lambda \sim\Gamma(\alpha,\beta): && \alpha \text{ occurances of } \beta \text{ distinct events}\\ &\text{Likelihood of } X_{1,\dots,n}: &&\sum_{k}x_{k} \text{ occurances of }n \text{ other events} \end{align*}$and the posterior $\Gamma\left( \alpha+\sum_{k}x_{k}, \beta+n\right)$ is just the "sum" of the two. > > Similarly, a beta prior $\lambda \sim B(\alpha,\beta)$ conjugating with $X \sim \mathrm{Binomial}(n,\lambda)$ can be interpreted as previous Binomial/Bernoulli trials: $\begin{align*} &\text{Prior } \lambda \sim B(\alpha,\beta): && \alpha-1 \text{ successes and }\beta-1 \text{ failures seen}\\ &\text{Likelihood of } X: &&X=x \text{ successes in }n \text{ more trials} \end{align*}$so the posterior $B(\alpha+x, \beta+n-x)$ records $+x$ successes and $+(n-x)$ failures.