> [!definition|*] Order Statistics
> Given data $X_{i}=x_{i}$ for $i=1, \dots, n$, their **$r$th order statistic** is $X_{(r)}=x_{(r)}$, the $r$th smallest observed value.
- In particular, $X_{(1)}=\min \{ X_{i} \}$, and $X_{(n)}=\max \{ X_{i} \}$.
* The **median** is $M=\begin{cases}
X_{(m+1)} & \text{if $n=2m+1$ odd} \\
\frac{1}{2}(X_{(m)}+X_{(m+1)}) & \text{if $n=2m$ even}
\end{cases}$
> [!theorem|*] Distribution of the Order Statistics
> If the iid. data have cdf and pdf $X_{i} \sim F,f$, then their order statistics have distributions: $X_{(r)} \sim f_{(r)}(x)=\frac{n!}{(r-1)!(n-r)!}F(x)^{r-1} [1-F(x)]^{n-r}f(x)$
> > [!proof]-
> > Note that $F_{(r)}(x)= \Bbb{P}(X_{(r)} \le x)=\Bbb{P}(\text{at least } r \text{ of }X_{i}\le x)$.
> > Hence $\begin{align*}F_{(r)}-F_{(r+1)}&= \Bbb{P}(r \text{ of }X_{i}\le x,\, (n-r)\text{ of }X_{i} > x)\\ &=\begin{pmatrix}n \\ r\end{pmatrix}F(x)^{r}[1-F(x)]^{n-r}
> \end{align*}$Then differentiate and use induction to prove the result.
>
Order statistics of the uniform distribution $U[0,1]$: for an iid. sample $U_{1}, \dots, U_{n} \sim U[0,1]$, their order statistics $U_{(r)}$ follows $\text{Beta}(r,n-r+1)$, which has $\begin{align*}
\mathbb{E}[U_{(r)}]&=\frac{r}{r+s}=\frac{r}{n+1}\\[0.4em]
\mathrm{Var}(U_{(r)})&= \frac{rs}{(r+s)^{2}(r+s+1)}
\end{align*}$where $s=n-r+1$ is the second parameter.
## Q-Q Plots
> [!bigidea]
> Q-Q plots compare the order statistics of the observed sample to the order statistics of a known distribution; if the sample indeed follows that distribution, the plot should look like a line.
> [!theorem|*] Distribution of $F(X)$
> If $X \sim F$, where $F(x)$ is a strictly increasing cdf., then $Y=F(X) \sim U[0,1]$.
>
> > [!proof]-
> > $\begin{align*}
> \Bbb{P}(Y \le y)&= \Bbb{P}(F(X) \le y)\\
> &= \Bbb{P}(X \le F^{-1}(y))\\
> &= F(F^{-1}(y))=y
> \end{align*}$
> where the inverse exists since $F$ is strictly increasing. Hence $Y \sim U[0,1]$.
Therefore, using the delta method to estimate $X_{(r)}=F^{-1}(U_{(r)})$ from $U_{(r)}$ gives $\mathbb{E}[F^{-1}(U_{(r)})]\approx F^{-1}(\mathbb{E}[U_{(r)}])=F^{-1}\left( \frac{r}{n+1} \right)$with variance $\mathrm{Var}(F^{-1}(U_{(r)}))\approx\underbrace{\mathrm{Var}(U_{(r)})}_{\to 0}\cdot {\left[\frac{dF^{-1}(u)}{du}\Big|_{u=\frac{r}{n+1}}\right]^{2}} \to 0.$we expect $X_{(r)}$ to be close to $F^{-1}\left( \frac{r}{n+1} \right)$ when $n$ large, hence *forming a near-perfect line when plotting them against each other*.
> [!definition|*] Q-Q Plot
> The **Q-Q plot** then plots the observed order statistics $x_{(1)}, \dots, x_{(n)}$ against $F^{-1}\left( \frac{1}{n+1} \right), \dots, F^{-1}\left( \frac{n}{n+1} \right)$, the expected order of a distribution $F$.
>
> If the points are roughly linear, we can conclude that the data do follow the distribution; otherwise, the distribution would be a bad model.
### Q-Q Plots as Distribution Comparisons
Suppose we have a sample $X=(X_{1},\dots,X_{n}) \overset{\mathrm{iid.}}{\sim} F_{X}$, and we want to check if some theoretical distribution $F_{0}$ is a good model.
We can use $\{ X_{(r)} \}$ as approximations of
$F_{X}^{-1}\left( \frac{r}{n} \right) \approx X_{(r)}.$
Plotting observed rank statistics $\{ X_{(r)} \}$ against some quantiles $\{ \Phi^{-1}(z_{r}) \}$, e.g. $\mathbf{z}=(1 / n, 2 / n\dots, 1)$, we can approximate the plot
$(F_{0}^{-1}(p), F^{-1}_{X}(p))$
parametrized by $p \in (0,1)$.
The slope of the approximated plot is
$\begin{align*}
\frac{ d F_{X}^{-1}(p) }{ d F_{0}^{-1}(p) }&= \frac{ d F_{X}^{-1} }{ d p } \Big/ \frac{ d F_{0}^{-1} }{ d p } \\
&= \frac{f_{0}(F_{0}^{-1}(p))}{f_{X}(F_{X}^{-1}(p))}.
\end{align*} $
In the case of Gaussians $F_{0} \sim N(0,1)$ and $F_{X}\sim N(0, \sigma^{2})$, the slope simplifies to a constant of $\sigma$.
### Q-Q Plots in Practice
If the cdf. $F$ is known, we can directly compute $d_{(r)}=F^{-1}\left( \frac{r}{n+1} \right)$. However, *in practice, we want to test a family of distributions $\{ F(\theta) \}$ indexed by an unknown parameter $\theta$, so we do not have* $F$.
* Instead, we need *a linear relation that holds true for any parameter* $\theta$ in the family of distributions $F(\theta)$.
We start with the fact that $F(x_{(r)}, \theta) \approx\frac{r}{n+1}$, and rearrange to find the linear relation
$x_{(r)}=\alpha(\theta)\cdot g\left(\frac{r}{n+1}\right)+\beta(\theta)$
for some $g$ independent of $\theta$; different $\theta$ should only affect the slope $\alpha(\theta)$ and intercept $\beta(\theta)$ of the line.
> [!examples] Normal Q-Q Plot
> The **normal Q-Q plot**: if $X_{1,\dots,n} \overset{iid.}{\sim} N(\mu, \sigma^2)$, normalizing $X_{(r)}$ gives $\frac{X_{(r)}-\mu}{\sigma}\sim N(0,1) \Longrightarrow \Phi\left(\frac{X_{(r)}-\mu}{\sigma}\right) \approx \frac{r}{n+1}$giving the linear relationship $x_{(r)} \approx \sigma\Phi^{-1}\left( \frac{r}{n+1} \right)+\mu$therefore plotting $x_{(r)}$ against $\Phi^{-1}\left( \frac{r}{n+1} \right)$ should give a linear relationship.
> [!examples] Exponential Q-Q Plot
> For exponential distributions $X_{1,\dots,n} \overset{iid.}{\sim} \exp(\lambda)$, the equation $F(x_{(r)},\lambda)\approx \frac{r}{n+1}$ becomes $1-e^{-\lambda x_{(r)}} \approx \frac{r}{n+1}$so solving for $x_{(r)}$ gives $x_{(r)} \approx -\frac{1}{\lambda}\log\left( 1-\frac{r}{n+1} \right)$therefore plotting $x_{(r)}$ against $\log\left( 1-\frac{r}{n+1} \right)$ should give a linear relationship.