Hypothesis Testing - Random Notes Go Brrrrrrr

Suppose we want to study a random variable $\mathbf{X}$ to determine its distribution, using the observed values $\mathbf{x}$. For now, the distribution is assumed to be a family $f(\mathbf{X};\theta)$, where $\theta$ is the parameter yet to be determined/tested. Later, we will test for different things, e.g. if the family is appropriate for modeling the data. > [!definition|*] Hypotheses > In a hypothesis test, the **null hypothesis** $H_{0}$ is the statement assumed to be true, and we check if the evidence disagrees with it. > - For parameter testing, the most common null hypotheses is the **simple null hypothesis**: that $\theta=\theta_{0}$. > --- > The **alternative hypothesis** $H_{1}$ (or $H_{a}$) is the statement we will embrace, if evidence disagrees with $H_{0}$. > > For a parameter $\theta$ and a null hypothesis $H_{0}:\theta=\theta_{0}$, common alternative hypothesis include: > - A **simple alternative hypothesis**: that $\theta=\theta_{1} \ne \theta_{0}$. > - A **one-sided alternative hypothesis**: that $\theta > \theta_{0}$ (or $\theta < \theta_{0}$). > - A **two-sided alternative hypothesis**: that $\theta \ne \theta_{0}$. ## Rejecting the Null Hypothesis In order to reject the null hypothesis, some criterion is needed: the p-value gives the most common and basic criterion. *When the p-value is smaller than some threshold*, it suggests that the data disagrees with the null hypothesis. > [!definition|*] Test Statistics > Given data $\mathbf{X}=\mathbf{x}$, the **test statistic** is a function $t(\mathbf{X})$, ideally unlikely to take extreme values if $H_{0}$ is true. Its **observed value** is denoted $t_{\mathrm{obs}}=t(\mathbf{x})$. > - Hence an extreme value of $t_{\text{obs}}$ suggests that $H_{0}$ is incorrect. > [!definition|*] p-values > The **p-value** (of the statistic $t$) is the probability of getting a $t(\mathbf{X})$ at least as extreme as $t_{\mathrm{obs}}$, assuming $H_{0}$ to be true: $p\equiv \mathbb{P}(\text{getting }t(\mathbf{X}) \text{ as extreme as }t_{\mathrm{obs}}\,|\, H_{0}).$ What counts as "extreme" depends on the alternative hypotheses and the sampling distributions. - One-sided hypotheses $H_{1}: (\theta > \theta_{0})$ only count values of $t$ corresponding to large $\theta$ (similarly for $H_{1}:(\theta<\theta_{0})$). * Two-sided hypotheses $H_{1}:(\theta \ne \theta_{0})$ count $t$ corresponding to $\theta$ far from $\theta_{0}$ (in either direction) as extreme. * For choices of $t$ where large $\theta$ = large $t$, the p-values are: $\begin{align*} \text{One-sided}: p &=\mathbb{P}\big(t(\mathbf{X}) \ge t_{\mathrm{obs}} \,|\, H_{0}\big)\\ \text{Two-sided}: p &= \mathbb{P}\big(|t(\mathbf{X})| \ge |t_{\mathrm{obs}}| \,|\, H_{0}\big) \end{align*}$ > [!definition|*] Critical Regions > > More generally, given some criterion to reject $H_{0}$, the **critical region** $C \subseteq \mathbb{R}^{n}$ is the region containing the samples that would lead to the rejection of $H_{0}$: $C \equiv \{ \mathbf{x} \in \mathbb{R}^{n} \,|\, H_{0} \text{ rejected if } \mathbf{X}=\mathbf{ x} \}$in the case of using p-values, $C=\{ \mathbf{x} \in \mathbb{R}^{n} \,|\, p(\mathbf{x}) < \text{threshold}\}$ ## Errors in Hypothesis Testing > [!definition|*] Errors > A **type I error** is a false positive: rejecting $H_{0}$ when it is true. > A **type II error** is a false negative: failing to reject $H_{0}$ when it is false. > [!definition|*] Power, Size > > Given simple hypotheses $H_{0}:\theta=\theta_{0}$ and $H_{1}:\theta=\theta_{1}$, > - The **size** of the test is the probability of type I error: $\alpha \equiv\mathbb{P}(\mathbf{X} \in C\,|\, H_{0})$. > - The probability of type II error is denoted $\beta=\mathbb{P}( \mathbf{X} \notin C \,|\, H_{1})$; > - the **power** of the test is $1-\beta$, i.e. the probability of rejecting $H_{0}$ when $H_{1}$ is true. > > Given composite hypotheses $H_{0}:\theta \in \Theta_{0}$ and $H_{1}:\theta \in \Theta_{1}$, > - The **size** is $\alpha \equiv \sup_{\theta \in \Theta_{0}}\mathbb{P}(\mathbf{X} \in C \,|\, \theta)$, > - The **power** is now a function $w(\theta)=\mathbb{P}(\mathbf{X} \in C \,|\, \theta)$. ## Connection to Confidence Intervals See [[Confidence Intervals, Tests, P-Values]].