> [!notation] > For simplicity, the open interval centered at $c$ with radius $r$ is denoted as $(c \pm r) \equiv (c-r, c+r)$and similarly for closed and half-open intervals. A $100(1-\alpha)\%$ **confidence interval** $(a(\mathbf{X}),b(\mathbf{X}))$ of some parameter $\theta$ is an interval such that $\mathbb{P}[\theta \in \big(a(\mathbf{X}), b(\mathbf{X}))\big]=1-\alpha$that is, *the interval has a $(1-\alpha)$ chance of containing $\theta$.* - Note that *the interval is a random variable, $\theta$ is not.* * Confidence intervals are usually expected to: * Not depend on $\theta$, otherwise we are begging the question for the value of $\theta$. * Become tighter with more data with its expected length shrinking. > [!examples] Confidence interval of Gaussian sample mean > If $X_{1},\dots,X_{n} \overset{iid.}{\sim} N(\mu,\sigma_{0}^{2})$ where $\sigma_{0}$ is known, then the mean $\mu$ has a $(1-\alpha)$ confidence interval of $\left( \bar{X} \pm z_{\alpha / 2}\frac{\sigma_{0}}{\sqrt{ n }} \right)$where $z_{p}:1-\Phi(z_{p})=p$. It is independent of $\mu$, and its length shrinks with larger $n$ (sample size). ## Neyman Confidence Intervals > [!definition|*] Neyman Confidence Intervals > Given a point estimate $\hat{\theta}(\mathbf{x})$, the **Neyman construction** of confidence intervals finds the bounds $\hat{\theta}_{\text{lo}}(\mathbf{x}),\hat{\theta}_{\mathrm{up}}(\mathbf{x})$. > > Suppose given the "true" parameter $\theta$, the estimate $\hat{\theta}(\mathbf{X})$ has distribution $g_{\theta}(t)$ where $t$ is a dummy variable, then define$\begin{align*} \hat{\theta}_{\mathrm{lo}}(\mathbf{x})&:= \inf\left\{ \theta :\int _{\hat{\theta}(\mathbf{x})}^{\infty} g_{\theta}(t) \, dt \ge \frac{\alpha}{2} \right\},\\ \hat{\theta}_{\mathrm{up}}(\mathbf{x})&:= \sup\left\{ \theta :\int ^{\hat{\theta}(\mathbf{x})}_{-\infty} g_{\theta}(t) \, dt \ge \frac{\alpha}{2} \right\}. \end{align*} $ > That is, moving left/right from $\hat{\theta}(\mathbf{x})$ until the integral drops below $\alpha / 2$. - Note that $\theta \in (\hat{\theta}_{\mathrm{lo}}(\mathbf{X}), \hat{\theta}_{\mathrm{up}}(\mathbf{X}))$ if and only if both defining inequalities above hold under $\theta$, i.e. $\hat{\theta}(\mathbf{X})$ is within the central $1-\alpha$ quantile of $g_{\theta}$. This happens with probability $1-\alpha$, regardless of $\theta$. A more formal justification is in [[Computer Age Statistical Inference|CASI p204 note 1]]. In a hypothesis testing (with p-values) setup, let $H_{0}(t)$ denote the null hypothesis $H_{0}:\theta=t$. Then the Neyman construction can be interpreted as > [!lemma|*] Neyman Confidence intervals as Unrejected Nulls > > $\begin{align*} &\hat{\theta}_{\mathrm{lo}}(\mathbf{x})= \inf\left\{ t \text{ where }H_{0}(t) \text{ has upper tail} \ge \frac{\alpha}{2} \right\},\\ &\hat{\theta}_{\mathrm{up}}(\mathbf{x})=\sup\left\{ t \text{ where }H_{0}(t) \text{ has lower tail} \ge \frac{\alpha}{2} \right\},\\[0.4em] &\theta_{0} \in (\hat{\theta}_{\mathrm{lo}}(\mathbf{x}), \hat{\theta}_{\mathrm{hi}}(\mathbf{x})) \iff H_{0}(\theta_{0}) \text{ have both tails}\ge \alpha / 2 \end{align*}$ Therefore, they define the collection of null hypotheses that the data won't reject based on a two-tailed p-value test. > $(\hat{\theta}_{\mathrm{lo}}(\mathbf{x}),\hat{\theta}_{\mathrm{up}}(\mathbf{x}))=\{t~|~H_{0}(t) \text{ not rejected, given } \mathbf{x} \}.$ Therefore, we can convert a statistical test to a confidence interval: > [!idea] Converting between tests and CI > Given a test at level $\alpha$, we get a CI with confidence level $1-\alpha$ with the Neyman CI defined by $\{ t: H_{0} \text{ is not rejected by the test} \}.$ > > > [!proof]- > > Write $H_{0}(\theta_{0})$ for the hypothesis $\theta=\theta_{0}$, and by "$H_{0}$ is rejected by $\mathbf{x}quot; we mean that the test rejects $H_{0}$ based on the observed values $\mathbf{x}$. > > > > Let $\mathrm{CI}(\mathbf{x})$ be the CI defined above, using the observed values $\mathbf{x}$. > > > > Now $\begin{align*} \mathbb{P}[\theta \in \mathrm{CI}(\mathbf{X})]&= \mathbb{P}[H_{0}(\theta) \text{ is not rejected by }\mathbf{X}]\\ &= 1-\text{type-1 error rate of the test under }\theta\\ &= 1-\alpha. \end{align*}$ ^551219 - See [[Confidence Intervals, Tests, P-Values|the note about CI, testing, and p-values]] for more. Note that this coincides with $\hat{\theta}_{\mathrm{lo}}^{*}=\inf\left\{ \theta \,|\, \int ^{\theta}_{-\infty} g_{\hat{\theta}}(t) \, dt \ge \frac{\alpha}{2} \right\}$ (analogously for $\hat{\theta}_{\mathrm{up}}^{*}$) when $g_{\theta}(t)$ is symmetric like student's T. - That is, the CI obtained by placing $g$ centered at $\hat{\theta}$ and slicing off tails of size $\alpha / 2$, usually of the form $\hat{\theta} \pm \widehat{\mathrm{sd}}(\hat{\theta})\cdot q(\alpha / 2),$where $q(\alpha / 2)$ is the upper $\alpha / 2$ quantile of the distribution $g$. - In this case, replacing $\hat{\theta}$ with some constant $\theta_{0}$ agrees with the (equal-tailed) critical region of testing a null hypothesis $H_{0}:\theta=\theta_{0}$. *Neyman's construction is transformation invariant*. That is, if $g$ is continuous monotonic function, then $g(\theta)$ has confidence interval $[g(\hat{\theta}_{\mathrm{lo}}), g(\hat{\theta}_{\mathrm{up}})]$. - This is not true in general for the "usual" definition. ## Bootstrap Confidence Intervals Given a bootstrap sample $\{ \theta_{1} ^{\ast},\dots,\theta_{B}^{\ast} \}$, the **bootstrap cdf** estimates the true distribution $g_{\theta}$ of $\hat{\theta}$ as $\hat{G}(t):= \frac{1}{B}\sum_{b=1}^{B}\mathbf{1}\{\theta_{b}^{\ast} \le t\}.$Using it, we can find approximate $\alpha$ percentiles and $(1-\alpha)$confidence intervals $\begin{align*} \widehat{\theta^{(\alpha)}}&:= \hat{G}^{-1}(\alpha),\\[0.4em] \mathcal{C}_{\alpha}&= [\widehat{\theta^{(\alpha / 2)}}, \widehat{\theta^{(1-\alpha / 2)}}]. \end{align*}$ - Note that the (approximate) quantiles are invariant under monotone maps, so if there is monotone map $m:\theta \mapsto m(\theta)=: \phi \sim N(0,1)$, then the transformed bootstrap samples $\{ m(\theta ^{\ast}_{1}),\dots,m(\theta ^{\ast}_{B}) \} \overset{\mathrm{iid.}}{\sim} N(0,1)$, and their approximate quantiles provide legit confidence intervals. Now transforming back, by invariance the CI obtained from $\hat{G}$ is also legit. - If we use the asymptotically normal $\hat{\theta}$, its own convergence can be slower than some transformation $m(\hat{\theta})$, so by extension *the bootstrap CI can converge faster than the normal approximation* $\left( \hat{\theta} \pm \frac{z_{1-\alpha / 2}}{\sqrt{ I(\hat{\theta}) }} \right)$. ## Asymptotic Normal CI of MLE's Because of [[Maximum Likelihood Estimator#^f3ac9d|asymptotic normality of the MLE]], $\sqrt{ I_{n}(\theta) }(\hat{\theta}-\theta) \overset{D}{\approx} N(0,1)$ when $n$ is large, so $\mathbb{P}\left[ \sqrt{ I_{n}(\theta) }(\hat{\theta}-\theta) \in (0\pm z_{\alpha / 2}) \right] \approx 1-\alpha.$Then solving for $\theta$ gives its $(1-\alpha)$ CI: $\mathbb{P}\left[\theta \in \left( \hat{\theta} \pm\frac{z_{\alpha / 2}}{\sqrt{ I_{n}(\theta) }} \right) \right] \approx 1-\alpha,$and to make the interval independent of the true $\theta$, estimate $I_{n}(\theta)$ with the expected information $I_{n}(\hat{\theta})$ or observed information $J_{n}(\hat{\theta})$ to get the $(1-\alpha)$ CI: $\left( \hat{\theta} \pm\frac{z_{\alpha / 2}}{\sqrt{ I_{n}(\hat{\theta}) }} \right) \text{ or } \left( \hat{\theta} \pm\frac{z_{\alpha / 2}}{\sqrt{ J_{n}(\hat{\theta}) }} \right)$ The parameter $\theta$ of some distributions is bounded, so estimating them with $N(\hat{ \theta},I_{n}(\theta)^{-1})$ defined over $\theta \in(-\infty, \infty)$ might produce a CI that includes impossible values. - E.g. we might obtain a CI of $(0.1 \pm 0.15)$ for $\mathrm{Ber}(\theta)$, but $\theta \in [0,1]$ by definition. > [!exposition] Solution > Estimate a bijective function $\phi=f(\theta):[0,1] \to \mathbb{R}$, so there are no impossible values for $\hat{\phi}$. > First, estimate the distribution of $\phi$. The delta method gives $\hat{\phi} \overset{D}{\approx}N\left(\phi, \frac{f'(\theta)^2}{I_{n}(\theta)} \right)$ > Second, compute the CI with standard methods, giving say $(\phi_{1}, \phi_{2})$. > Lastly, transform back to $\theta$: since MLE's are invariant, $\hat{ \theta}=f^{-1}(\hat{\phi})$, and the CI is $(f^{-1}(\phi_{1}),f^{-1}(\phi_{2}))$.