Uninformative Priors - Random Notes Go Brrrrrrr

> [!tldr] > **Uninformative priors** are prior distributions in [[Bayesian Inference]] that represent a lack of information. In cases where we have no prior information, **uninformative/objective priors** can represent ignorance. > [!examples] > For example, we may choose $p \sim U[0,1]$ for the probability in $\text{Bernoulli}(p)$. > > Following the interpretations in the previous section, "no previous experiments" correspond to $B(1,1)=U[0,1]$. However, for parameters without bounds, e.g. the mean $\mu$ in $N(\mu,\sigma^{2})$, an ignorant prior would be $U(\mathbb{R})$, which cannot be normalized to have mass $1$. In those cases, they are called **improper priors**. - Improper priors can still produce proper posteriors, so they are not an issue per se. Another issue is that *in general, reparametrization does not preserve uniformity*: given prior $\pi(\theta)$ and invertible transformation $\phi=\phi(\theta)$, its reparametrized distribution is $p(\phi)=\pi(\theta(\phi)) \left| \frac{d\theta}{d\phi}\right|$which is not necessarily uniform. * For example, suppose $\phi \sim U[0,1]$ (left), then under a log-odds map $\phi \to \psi=\log(\phi / (1-\phi))$, ![[Logodds.png#invert]] the log-odds (right) is bell-shaped, concentrated within $[-3,3]$. It's no longer "ignorant". ### Jefferys Prior > [!definition|*] Jeffery's Prior > > Given a likelihood $L(X;\theta)$, its **Jefferys prior** is a prior that is resistant to reparametrization, defined by $\pi(\theta)\propto \sqrt{ I_{X}(\theta)}, $where $I_{X}(\theta)$ is the [[Information and Bounding Errors#Fisher and Observed Information|expected/Fisher information]] of $\theta$ from that likelihood. > >[!info]- In higher dimensions > >The Jefferys prior in higher dimensions is $\pi(\theta)\propto | I_{X}(\theta) |^{1/2}$, i.e. root of the determinant of the Fisher information matrix. Jefferys priors are resistant in the sense that *it is always proportional to the root-information*: if $\pi(\theta)$ is a Jefferys prior, then reparametrizing with $\phi:=\phi(\theta)$ gives the new prior $p(\phi)=\pi(\theta(\phi))\left|\frac{ d \theta }{ d \phi } \right| \propto \sqrt{ I_{X}(\theta)\left( \frac{ d \theta }{ d \phi } \right)^{2} } =\sqrt{ I^{\ast}_{X}(\phi) }\,,$which is the Jefferys prior of $\phi$. Note that $I^{\ast}_{X}(\phi)$ is the Fisher information of $\phi$, not $\theta$. Common Jefferys priors include: - $\pi(\theta)\propto1$, where $\theta$ is a **location parameter**, e.g. the mean of $N(\theta,\sigma^{2})$, or the lower bound of $U[\theta,\theta+1]$. - $\pi(\sigma)\propto \frac{1}{\sigma}$, where $\sigma$ is a **scale parameter**, e.g. the standard deviation $\sigma$ of $N(\theta,\sigma^{2})$. ### Maximum Entropy Prior The **maximum entropy prior** has the largest **entropy** $\mathrm{Ent}[\pi]:=-\int_{\Theta} \pi \log \pi \, d\theta $among all choices (e.g. those that satisfy a certain constraint). - The larger the entropy, the less information a distribution contains. - For distributions with unbounded entropy, the maximum entropy prior need not exist. > [!examples] Maximum entropy priors > If a prior $\pi(\theta)$ is constrained to have $\mathbb{E}[\theta]=\mu_{\theta}$ and $\mathrm{Var}(\theta)=\sigma^{2}_{\theta}$, then the maximum entropy prior is $\theta \sim N(\mu_{\theta},\sigma^{2}_{\theta})$. > [!theorem|*] Exponential Famility Priors have Maximum Entropy > If there are functions $T_{1},\dots,T_{k}:\mathcal{X} \to \mathbb{R}$ that define the constraints $\forall i,~\mathbb{E}_{\theta}[T_{i}(\mathbf{X})]=t_{i}$for constants $t_{1},\dots,t_{k}$, then the entropy is uniquely maximized by the exponential family of priors $\{ \pi(\theta):= \exp[\mathbf{T}(\theta)\cdot \lambda-B(\lambda)] \}$where $\lambda$ is a (vector-valued) hyperparameter.