Suppose we want to study a random variable $\mathbf{X}$ to determine its distribution, using the observed values $\mathbf{x}$.
For now, the distribution is assumed to be a family $f(\mathbf{X};\theta)$, where $\theta$ is the parameter yet to be determined/tested. Later, we will test for different things, e.g. if the family is appropriate for modeling the data.
> [!definition|*] Hypotheses
> In a hypothesis test, the **null hypothesis** $H_{0}$ is the statement assumed to be true, and we check if the evidence disagrees with it.
> - For parameter testing, the most common null hypotheses is the **simple null hypothesis**: that $\theta=\theta_{0}$.
> ---
> The **alternative hypothesis** $H_{1}$ (or $H_{a}$) is the statement we will embrace, if evidence disagrees with $H_{0}$.
>
> For a parameter $\theta$ and a null hypothesis $H_{0}:\theta=\theta_{0}$, common alternative hypothesis include:
> - A **simple alternative hypothesis**: that $\theta=\theta_{1} \ne \theta_{0}$.
> - A **one-sided alternative hypothesis**: that $\theta > \theta_{0}$ (or $\theta < \theta_{0}$).
> - A **two-sided alternative hypothesis**: that $\theta \ne \theta_{0}$.
## Rejecting the Null Hypothesis
In order to reject the null hypothesis, some criterion is needed: the p-value gives the most common and basic criterion. *When the p-value is smaller than some threshold*, it suggests that the data disagrees with the null hypothesis.
> [!definition|*] Test Statistics
> Given data $\mathbf{X}=\mathbf{x}$, the **test statistic** is a function $t(\mathbf{X})$, ideally unlikely to take extreme values if $H_{0}$ is true. Its **observed value** is denoted $t_{\mathrm{obs}}=t(\mathbf{x})$.
> - Hence an extreme value of $t_{\text{obs}}$ suggests that $H_{0}$ is incorrect.
> [!definition|*] p-values
> The **p-value** (of the statistic $t$) is the probability of getting a $t(\mathbf{X})$ at least as extreme as $t_{\mathrm{obs}}$, assuming $H_{0}$ to be true: $p\equiv \mathbb{P}(\text{getting }t(\mathbf{X}) \text{ as extreme as }t_{\mathrm{obs}}\,|\, H_{0}).$
What counts as "extreme" depends on the alternative hypotheses and the sampling distributions.
- One-sided hypotheses $H_{1}: (\theta > \theta_{0})$ only count values of $t$ corresponding to large $\theta$ (similarly for $H_{1}:(\theta<\theta_{0})$).
* Two-sided hypotheses $H_{1}:(\theta \ne \theta_{0})$ count $t$ corresponding to $\theta$ far from $\theta_{0}$ (in either direction) as extreme.
* For choices of $t$ where large $\theta$ = large $t$, the p-values are:
$\begin{align*}
\text{One-sided}: p &=\mathbb{P}\big(t(\mathbf{X}) \ge t_{\mathrm{obs}} \,|\, H_{0}\big)\\
\text{Two-sided}: p &= \mathbb{P}\big(|t(\mathbf{X})| \ge |t_{\mathrm{obs}}| \,|\, H_{0}\big)
\end{align*}$
> [!definition|*] Critical Regions
>
> More generally, given some criterion to reject $H_{0}$, the **critical region** $C \subseteq \mathbb{R}^{n}$ is the region containing the samples that would lead to the rejection of $H_{0}$: $C \equiv \{ \mathbf{x} \in \mathbb{R}^{n} \,|\, H_{0} \text{ rejected if } \mathbf{X}=\mathbf{ x} \}$in the case of using p-values, $C=\{ \mathbf{x} \in \mathbb{R}^{n} \,|\, p(\mathbf{x}) < \text{threshold}\}$
## Errors in Hypothesis Testing
> [!definition|*] Errors
> A **type I error** is a false positive: rejecting $H_{0}$ when it is true.
> A **type II error** is a false negative: failing to reject $H_{0}$ when it is false.
> [!definition|*] Power, Size
>
> Given simple hypotheses $H_{0}:\theta=\theta_{0}$ and $H_{1}:\theta=\theta_{1}$,
> - The **size** of the test is the probability of type I error: $\alpha \equiv\mathbb{P}(\mathbf{X} \in C\,|\, H_{0})$.
> - The probability of type II error is denoted $\beta=\mathbb{P}( \mathbf{X} \notin C \,|\, H_{1})$;
> - the **power** of the test is $1-\beta$, i.e. the probability of rejecting $H_{0}$ when $H_{1}$ is true.
>
> Given composite hypotheses $H_{0}:\theta \in \Theta_{0}$ and $H_{1}:\theta \in \Theta_{1}$,
> - The **size** is $\alpha \equiv \sup_{\theta \in \Theta_{0}}\mathbb{P}(\mathbf{X} \in C \,|\, \theta)$,
> - The **power** is now a function $w(\theta)=\mathbb{P}(\mathbf{X} \in C \,|\, \theta)$.
## Connection to Confidence Intervals
See [[Confidence Intervals, Tests, P-Values]].