Permutation-Based Tests - Random Notes Go Brrrrrrr

Suppose we have samples $\mathbf{X}$ and want to test some hypothesis $H_{0}$. Often it's the case that $\mathbf{X}$ is in some sense exchangeable given $H_{0}$. ### Significance Analysis of Microarrays (SAM) The SAM procedure is a test conducted on a $(N\times n)$ array $\mathbf{X}$, with $x_{ij}:= \text{expression level of the $i$th gene in the $j$th person}$in the original microarrays interpretation, and a subset of $\mathbf{X}s columns (WLOG the last $k$ of them) correspond to interested subjects (e.g. those with a certain disease). - That said, it is applicable to general multiple testing as well. The hypotheses are $H_{0i}: \text{the expectaion $\mu_{i}$ of the $i$th gene's expression is }0.$ Then under the global null defined as ![[Multiple Testing#^e1f677]] there is no difference between healthy people and patients, so we can permute the columns to get a [[Bootstraps|bootstrapped]] dataset $\mathbf{X}^{(b)}= (x^{(b)}_{i,j}):=(x_{i,\sigma^{(b)}(j)}),~~b=1,\dots,B$for some $\sigma^{(b)}$ selected uniformly from all permutations of $\{ 1,\dots,n \}$ to generate each bootstrapped dataset. *Under $H_{0}$, this should be close to sampling new data (under $H_{0}$),* with the key assumption that $(\mathbf{X}^{(b)} \overset{D}{\approx}\mathbf{X})~|~H_{0},$ so we can use the bootstrapped datasets to estimate distributions of test statistics under $H_{0}$. For example, say we wish to study some z-scores $(Z_{1},\dots,Z_{N})$ computed for each gene, with order statistics $(Z_{(1)},\dots,Z_{(N)})$. SAM approximates their distribution under $H_{0}$: - Compute the order statistics $z_{(r)}^{(b)}:=b\text{th largest $z_{i}^{(b)}$ in $\mathbf{z}^{(b)}$}.$Collect them into $\mathbf{z}_{(r)}=(z^{(1)}_{(r)},\dots,z^{(B)}_{(r)})$. - Use the empirical distribution of $\mathbf{z}_{(r)}$ to approximate that of $Z_{(r)}~|~H_{0}$: $\begin{align*} \hat{\mathbb{P}}[Z_{(r)}\le z ~|~ H_{0}]&= \frac{1}{B} \sum_{b}\mathbf{1}\{ z_{(r)}^{(b)} \le z\},\\ \overline{Z_{(r)}}:=\hat{\mathbb{E}}[Z_{(r)} ~|~ H_{0}] &= \frac{1}{B}\sum_{b}z_{(r)}^{(b)}. \end{align*}$ Now to check for unusual $Z_{i}$ under $H_{0}$, we can compare the observed $Z_{(r)}=z_{(r)}$ against their (approximated) mean $\hat{\mathbb{E}}[Z_{(r)}~|~H_{0}]$ by plotting them in a [[Order Statistics|QQ plot]]. Now we can choose some margin $\Delta$ to find rejections: for example, a lower-tail critical region can be found by $\begin{align*} z^{\ast}(\Delta)&:= \min\{ Z_{(r)} ~|~ Z_{(r)} < \overline{Z_{(r)}} - \Delta \},\\ \mathcal{R}(\Delta)&:= \{ z ~|~ z \le z^{\ast}(\Delta) \}. \end{align*}$Here we can control the [[False Discovery Rate Control|FDR]]