Rank-Based Tests - Random Notes Go Brrrrrrr

> [!tldr] > Rank-based tests are **distribution-free** hypothesis tests based on rank statistics -- they do not assume any underlying distribution to work. Suppose we have an sample $\mathbf{Z}=(Z_{1},\dots,Z_{N})$ and wish to test for a position parameter $\Delta$, but we are unsure of the population distribution. - Tests that assume a certain distribution (e.g. **z-tests** and **t-tests** that assume normality) will give incorrect p-values if the assumption is badly off. - Hence a statistic that is **distribution-free** is needed. ## Ranks and Exchangeability > [!tldr] > If the sample $\mathbf{Z}$ is **exchangeable**, then its **ranks** (and statistics that are solely dependent on the ranks) are distribution-free, uniform, and immune to monotone transformations. > [!warning] For simplicity, we assume there are no ties in the data. > [!definition|*] Ranks > The **ranks** $R(\mathbf{Z})=(R_{1},\dots,R_{N})$ is a statistic of a sample $\mathbf{Z}$. The rank of the observation $Z_{i}$ is the number of observations no larger than it: $R_{i}:= \sum_{j=1}^{N}\mathbf{1}_{Z_{j} \le Z_{i}}.$Equivalently, $R_{i}$ is the ($1$-based) index $j$ such that the order statistic $Z_{(j)}=Z_{i}$. One useful way of treating ranks is a function $R$ mapping $i$ to the rank of $Z_{i}$ (where the sample goes after sorting); it has an inverse map $D$ if there are no ties, which maps the rank $R_{i}$ to the index $i$ of the sample that has the rank (where the sample was before sorting). - For example, if $\mathbf{Z}=(1,10,4,2)$, then $R=(1,4,3,2)$, and $D=(1,4,3,2)$. We wish to say that ranks are distribution-free and *all orders are equally likely*. That is, $R(\mathbf{Z}) \sim U[\mathcal{P}_{N}]$, where $\mathcal{P}_{N}$ is the set of all possible permutations of $\{ 1,\dots,N \}$. - That is obviously not true in general, e.g. when the samples are sorted before the analysis. - IID samples have distribution-free ranks, but we can in fact relax the requirement: > [!definition|*] Exchangeability > A sample $\mathbf{Z}=(Z_{1},\dots,Z_{N})$ with joint density $f_{1:N}$ is **exchangeable** if permuting the inputs does not change the distribution: $Z_{1},\dots,Z_{N} \sim Z_{\sigma1},\dots,Z_{\sigma N}$for any permutation $\sigma$ on $\{ 1,\dots,N \}$. For continuous variables, this is equivalent with $ \forall (z_{1},\dots,z_{N}),\,\,\, f_{1:N}(z_{1},\dots,z_{N})=f_{1:N}(z_{\sigma1},\dots,z_{\sigma N}),$where $f_{1:N}$ is the joint density of $\mathbf{Z}$. - IID implies exchangeability, but the converse is not true: for example $X_{1},X_{2} \sim N(0,1)$ with correlation $\rho \ne 0$ are also exchangeable. - One necessary condition for exchangeability is that *all $N$ marginal distributions of $Z_{1},\dots,Z_{N}$ must be the same* (when computing the marginal distribution of $Z_{i}$, exchanging it with $Z_{j}$ when integrating out all $Z_{k \ne i}$ gives the marginal of $Z_{j}$ instead). > [!theorem|*] Uniformity of Ranks in Exchangeable Samples > If the observations $\mathbf{Z}$ are exchangeable, then their ranks are uniformly distributed: $R(\mathbf{Z}) \sim U[\mathcal{P}_{N}]$. > > > [!proof]- > > Consider any $r = \mathcal{P}_{N}$, and its sorted indices $d:= D(r)$. Then $\{ R=r \}=\{ Z_{d_{1}}<Z_{d_{2}}<\dots<Z_{d_{N}} \},$and the have probability > > $\begin{align*} \mathbb{P}[Z_{d_{1}}<\dots<Z_{d_{N}}]&= \int \mathbf{1}\{ z_{d_{1}}<\dots<z_{d_{N}} \}\cdot f_{\mathbf{Z}}(z_{1},\dots,z_{N}) ~ d\mathbf{z}\\ &= \int \mathbf{1}\{ s_{1}<\dots<s_{N} \}\cdot f_{\mathbf{Z}}(s_{r_{1}},\dots, s_{r_{N}}) ~ d\mathbf{s}\\ &\hphantom{\mathbf{1}\{ s_{1}<\dots<s_{N} \}}[s_{i}:=z_{d_{i}},\text{ so }s_{r_{i}}=z_{i}]\\ &= \int \mathbf{1}\{ s_{1}<\dots<s_{N} \}\cdot f_{\mathbf{Z}}(s_{1},\dots, s_{N}) ~ d\mathbf{s}\\ &\hphantom{\mathbf{1}\{ s_{1}<\dots<s_{N} \}}[\text{exchangeability of }f_{\mathbf{Z}}]\\[0.4em] &= \mathbb{P}[Z_{1}<\dots<Z_{N}]. \end{align*}$ Therefore, all ranks are equally probable, and $R \sim U[\mathcal{P}_{N}]$. As a result, ranks have the following desirable properties: - It is **distribution-free** or **ancillary** over the family of exchangeable distributions $\mathcal{D}$, meaning that $R(\mathbf{Z})$ has the same distribution for any $D \in \mathcal{D}$, $\mathbf{Z} \overset{\mathrm{iid.}}{\sim} D$. - It is invariant under monotone transformations on the data (as long as they do not produce ties); so do test statistics $T(R(\mathbf{Z}))$ that only depends on the ranks. - It is [[Robustness|robust]] against outliers, similar to the median. > [!theorem|*] Independence of Ranks and Order Statistics in Exchangeable Samples > Furthermore, if $\mathbf{Z}$ is exchangeable, then their *ranks $R(\mathbf{Z})$ and order statistics $(Z_{(1)},\dots,Z_{(N)})$ are independent.* > > [!proof] > > Write $\phi(\mathbf{Z}):= (Z_{(1)},\dots,Z_{(N)},R(\mathbf{Z}))$, and let $\sigma$ be any permutation on $\{ 1,\dots ,N \}$. > > > > Since $\mathbf{Z} \sim \sigma \mathbf{Z}$ are identically distributed, so are $\phi(\mathbf{Z})$ and $\phi(\sigma \mathbf{Z})$. In particular, conditioning on the rank $\{ R=r \},$ $\begin{align*} (Z_{(1)},&\dots,Z_{(N)})~|~\{ R(\mathbf{Z})=r \} \\ &~~\sim~~(Z_{(1)},\dots,Z_{(N)}) ~|~ \{ R(\sigma \mathbf{Z}) =r\}. \end{align*}$We want to show that $\mathrm{LHS}$ has the same distribution regardless of the choice of $r$. > > > > Now choose $\sigma:=r$, i.e. mapping $i$ to $r_{i}$, so $r\mathbf{Z}=(Z_{r_{1}},\dots,Z_{r_{N}})$, i.e. the sorted sample. Then $R(r\mathbf{Z})=(1,\dots,N)$, and $\mathrm{RHS}$ becomes $\begin{align*} (Z_{(1)},&\dots,Z_{(N)})~|~\{ R(r\mathbf{Z}) =(1,\dots,N)\}\\ &\sim (Z_{(1)},\dots,Z_{(N)})~|~\{ R(\mathbf{Z}) =(1,\dots,N)\} \end{align*}$which does not depend on $r$. Therefore, $\mathrm{LHS}$ is also independent of the choice of $r$, and so $(Z_{(1)},\dots,Z_{(N)})$ is independent of $r$. ## General Setup of Rank-Based Tests Assuming interchangeability in $\mathbf{Z}$, so that $R(\mathbf{Z}) \sim U[\mathcal{P}_{N}]$, any statistic $T(R(\mathbf{Z}))$ that is a function of the ranks is also ancillary over $\mathcal{D}$ -- it has the distribution $T(S), S \sim U[\mathcal{P}_{N}]$. Therefore, a rank-based test with test statistic $T(R(\mathbf{Z}))$ computes the observed value $T(R(\mathbf{z}))$ and *tests its compatibility with some null hypothesis $H_{0}$ under which the data is exchangeable*. For example, if the test statistic takes extremely large values when $H_{0}$ is false, the rank-based test has **p-value** $p:= \mathbb{P}[T(R(\mathbf{Z})) \ge T(R(\mathbf{z}))]=\frac{1}{N!}\sum_{s \in \mathcal{P}_{N}}\mathbf{1}\{T(s) \ge T(R(\mathbf{z}))\}.$ Alternatively, the test can use appropriate **quantiles** as cut-offs: e.g. to have a one-sided test of (nominal) level $\alpha$, find the $\alpha$-quantile $w_{\alpha}$ as the cut-off of the critical region $(-\infty, w]$. - The pitfalls of the discrete quantiles are noted in [[#Nominal vs. Actual Levels]]. ## Two-Sample Permutation Tests Suppose we want to decide whether two samples $\mathbf{X}=(X_{1},\dots,X_{n}) \overset{\mathrm{iid.}}{\sim} F_{X}$ and $\mathbf{Y} = (Y_{1},\dots,Y_{m}) \overset{\mathrm{iid.}}{\sim} F_{Y}$ have the same distribution. In particular, we test whether there is a **location shift** $\Delta$ such that $X_{i},\, (Y_{j} - \Delta) \overset{\mathrm{iid.}}{\sim}F$for $i=1,\dots,n$, and $j = 1,\dots,m$. The hypotheses are just $\begin{align*} H_{0}: \Delta &= 0,\\ H_{1}:\Delta &\ne 0,\, \Delta >0,\, \Delta<0,\, \text{etc...} \end{align*}$ To do so, we compute rank-based statistics on the concatenated sample $\mathbf{Z} = (X_{1},\dots,X_{m}, Y_{1},\dots,Y_{n})$ of size $N:= m+n$. *Under $H_{0}$, the sample is iid., hence exchangeable.* ### Wilcoxson Rank Sum Test The WRST is tests the hypotheses with the **Wilcoxson rank sum test statistic**, i.e. the sum of the ranks of $\mathbf{Y}$: $W(\mathbf{R}):= \sum_{j=n+1}^{N}R_{j}.$ Under $H_{0}:\Delta=0$, exchangeability gives $R(\mathbf{Z}) \sim U(\mathcal{P}_{N})$. In contrast, *a large, positive $W(\mathbf{R})$ indicates a positive shift, and negative $W$ a negative shift*. The WRST then uses the test statistic $W(\mathbf{R}^\mathrm{obs})$ to construct a test with: - *p-values*: e.g. two-tailed test has p-value being $p:=\mathbb{P}\Big[| W(\mathbf{R}) - \mu_{m,n} | \ge | W(\mathbf{R}^{\mathrm{obs}})- \mu_{m,n} |\Big],$where $\mu_{m,n}= m(m + n + 1) /2=\mathbb{E}[W(\mathbf{R}) \,|\, H_{0}]$. - *quantiles*: e.g. to have a one-sided test of (nominal) level $\alpha$, find the $\alpha$-quantile $w_{\alpha}$ of $W(\mathbf{R}) ~|~ H_{0}$. A very similar concept is the **Mann-Witney test statistic**, defined by $\mathrm{MW}(\mathbf{X}, \mathbf{Y}):= \sum_{i,j}\mathbf{1}_{Y_{j} > X_{i}}, $and for the same dataset, $\mathrm{MW}=W - m(m+1) / 2$, so they give the same test. - Using this formulation, it is easy to see that under $H_{0}$, $\mathbb{E}[\mathrm{MW}]=mn / 2$, and $\mathbb{E}[W] = m(m+n+1) / 2$. > [!proof]- Proof of $\mathrm{MW}=W - m(m+1) / 2$ > Reorganize the sum in $\mathrm{MW}$ to be over each $Y_{j}$: it equals summing the ranks of $Y_{j}$, but ignoring its ranking among other $Y_{k}s. $\begin{align*} \mathrm{MW}&= \sum_{j}\# \{ i :\,X_{i} < Y_{j} \}\\ &= \sum_{j} R_{j} -\# \{ k :\, Y_{k} < Y_{j} \}\\ &= W - \frac{m(m+1)}{2}, \end{align*}$since the second terms are $\mathbf{Y}s internal ranks, which sum to ${m \choose 2}$. ## Testing One-Sample Locations Suppose that instead of two samples $\mathbf{X}$ and $\mathbf{Y}$, we only have a sample $\mathbf{Z} \overset{\mathrm{iid.}}{\sim}F_{Z}(\cdot\,;\mu)$ (not just exchangeable), and we wish to test for the **location** $\mu$ of $F_{Z}$. In particular, we are interested in $\Delta$ being the median, and assume $F_{Z}$ to be symmetric about $\mu$ (so $\mu$ is also the mean). The hypotheses are: $\begin{align*} H_{0}: \mu &= \mu_{0},\\ H_{1}:\mu &\ne \mu_{0},\, \mu >\mu_{0},\, \mu<\mu_{0},\, \text{etc...} \end{align*}$ ### Wilcoxson Signed Rank Statistic The Wilcoxson signed rank test uses another rank-based test statistic for testing one-sample locations: > [!definition|*] Wilcoxson Signed Rank Statistic > The **Wilcoxson Signed Rank Statistic** of a sample $\mathbf{Z}$ and location $\mu$ is ${W}_{\mathrm{SR}}(\mu):= \sum_{i=1}^{N}R^{*}_{i}\cdot \underbrace{\mathbf{1}_{Z_{i} > \mu}}_{=:I_{i}},$where $R_{i}^{*}$ are ranks of $| Z_{i}-\mu |$. > > That is, it ranks the deviations from the "center" $\mu$, and sums them for observations that are above $\mu$. ```python fold title:"Computing the signed rank in python" from scipy.stats import rankdata import numpy as np def signed_rank(data: np.ndarray, median: float): abs_deviations = np.abs((data - median)) ranks = rankdata(abs_deviations) is_above_median = (data > median).astype(float) signed_ranks = ranks * is_above_median return signed_ranks.sum() ``` If the symmetry assumption holds, and that $H_{0}$ is true (so $\Delta=\Delta_{0}$), then $R^{*}$ and the indicator functions are independent -- *a sample's distance from the median is independent from which side it is on* (proven in the next section). Otherwise, *if for example $\mu_{0}$ is higher than the actual median $\mu$, $W_{\mathrm{SR}}(\mu_{0})$ will be smaller than it should be, and vice versa.* - This is in spirit a Wilcoxson rank sum applied to $\mathbf{X}:= \{ | Z_{i} | \,:\, Z_{i}-\mu \le \mu \}$ and $\mathbf{Y}:= \{ | Z_{i}-\mu | : Z_{i} > \mu \}$, except that the assignment to $\mathbf{X},\mathbf{Y}$ is random. An equivalent definition of the wilcoxson signed rank is $W_{\mathrm{SR}}=\sum_{i,j} \mathbf{1}\left\{ \frac{X_{i} + X_{j}}{2} > \mu \right\},$where $\frac{X_{i}+X_{j}}{2}$ is called the **Walsh average**, and so the signed rank statistic is the number of Walsh averages greater than $\mu$. > [!proof]- Sketch proof of equivalence > For each Walsh average in the sum, "assign" the sum to the one further away from $\mu$. Therefore, the $i$th observation gets assigned $R_{i}^{\ast}$ entries in the sum. > > Moreover, for any $i$, the entries assigned are either all $1$, when $X_{i}>\mu$, or all $0$, if $X_{i} \le \mu$. This is by choice of the assignment: if $X_{j}$ is closer to $\mu$ than $X_{i}$, then the term $(X_{i}+X_{j}) / 2$ is on the same side of $\mu$ as $X_{i}$. > > Therefore, the original sum equals summing $R_{i}^{\ast}$ over all $i:X_{i} > \mu$. ### Distribution of the Signed Rank Statistic > [!theorem|*] Signed Rank Statistic is Sum of Bernoulli > > If the median $\mu_{0}$ is the true value, the signed rank statistic $W_{\mathrm{SR}}(\mu_{0})$ is identically distributed as $\sum_{j=1}^{N} jB_{j}, ~ ~ ~\text{where }B_{j} \overset{\mathrm{iid.}}{\sim} \mathrm{Bernoulli}\left( \frac{1}{2} \right).$ > > > [!proof]- > > We show that $I=\mathbf{1}_{\mathbf{Z}>\mu}\overset{\mathrm{iid.}}{\sim} \mathrm{Ber}\left( \frac{1}{2} \right)$ and $R^{\ast}(\mathbf{Z})$ are independent. Then conditioning on $R^{\ast}$, we see that $W_{\mathrm{SR}}~|~R^{\ast}$ has the same distribution as above (for any $R^{\ast}$). > > > > Fix $r^{\ast} \in \mathcal{P}_{n}$, and $i \in \{ 0,1 \}^{n}$. > > > > Let $\phi$ be any way of flipping $\mathbf{Z}$ around $\mu$ (it flips an arbitrary subset of entries in $\mathbf{Z}$); write $\phi I:= I(\phi Z)$. > > > > Symmetry of $Z$ about $\mu$ gives $f(\mathbf{z})=f(\phi \mathbf{z})$, and by extension $f_{Z ~|~I}(\mathbf{z})=f_{Z ~|~ \phi I}(\phi \mathbf{z})$. Then $\begin{align*} > \mathbb{P}&[R^{\ast}=r^{\ast} ~|~ I=i]\\ > &= \int \mathbb{P}[R^{\ast}=r^{\ast} ~|~ \mathbf{Z}=\mathbf{z},I=i] \cdot f_{\mathbf{Z} ~|~ I}(\mathbf{z}) ~ d\mathbf{z} \\ > &= \int \mathbb{P}[R^{\ast}=r^{\ast} ~|~ \mathbf{Z}=\mathbf{z}] \cdot f_{\mathbf{Z} ~|~ I=i}(\mathbf{z}) ~ d\mathbf{z} &[R^{\ast} \text{ determined by }\mathbf{Z}]\\ > &= \int \mathbb{P}[R^{\ast}=r^{\ast} ~|~ \mathbf{Z}=\mathbf{z}] \cdot f_{\mathbf{Z} ~|~ \phi I=\phi i}(\phi\mathbf{z}) ~ d\mathbf{z} & [\text{symmetry of }\mathbf{Z}]\\ > &= \int \mathbb{P}[R^{\ast}=r^{\ast} ~|~ \mathbf{Z}=\phi\mathbf{z}] \cdot f_{\mathbf{Z} ~|~ \phi I=\phi i}(\phi\mathbf{z}) ~ d\mathbf{z} & [R^{\ast}\text{ invariant under }\phi]\\ > &= \int \mathbb{P}[R^{\ast}=r^{\ast} ~|~ \mathbf{Z}=\mathbf{z}'] \cdot f_{\mathbf{Z} ~|~ \phi I=\phi i}(\mathbf{z}') ~ d\mathbf{z} &[\mathbf{z}':=\phi \mathbf{z}]\\ > &= \mathbb{P}[R^{\ast}=r^{\ast} ~|~ I=\phi^{-1}i]. > \end{align*}$Since $\phi$ is arbitrary, we conclude that $R^{\ast}$ is independent of $I$. - Hence its pmf $\mathbb{P}[W_{\mathrm{SR}} = k]$ equals $\frac{\left| \left\{ (b_{1},\dots,b_{N}) \,|\, \sum_{j=1}^{N}jb_{j}=k \right\} \right|}{| \{\text{all possible } (b_{1},\dots,b_{N}) \} |}=\frac{(\text{some function})}{2^{N}}$as the Bernoulli variables are uniformly distributed over $\{ 0,1 \}^{N}$. - Heuristically, this is like computing $\sum_{j=1}^{N}j\text{th rank if $X_{d_{j}}$ is above the median}.$ From this equivalence, it is obvious that $\begin{align*} \mathbb{E}[W_{\mathrm{SR}}]&= \sum_{j=1}^{N}j \mathbb{E}[B_{j}]=\frac{1}{2}\sum_{j=1}^{N}j=\frac{N(N+1)}{4};\\ \mathrm{Var}(W_{\mathrm{SR}})&= \sum_{j=1}^{N}j^{2}\mathrm{Var}(B_{j}) =\frac{1}{4}\sum_{j=1}^{N}j^{2}=\frac{N(N+1)(2N+1)}{24}. \end{align*}$ ### Application to Paired Samples Just like usual one-sample test statistics, Wilcoxson signed rank statistic can be applied to paired samples $(\mathbf{X}, \mathbf{Y})=(X_{1},Y_{1}),\dots,(X_{N},Y_{N})$ to test the location $\Delta$ of $(X_{i}-Y_{i})\sim F_{\text{diff}}(\cdot\,;\Delta).$ Note that *paired samples $X_{i},Y_{i}$ are correlated (hence not exchangeable), so the two-sample rank sum test does not work here*. However, assuming independence between different pairs, the differences $(X_{i}-Y_{i})$ are iid. hence exchangeable. If we further assume symmetry about its location $\Delta$, we may apply the signed rank test to the differences. For example, if $\mathbf{X}$ are patients' blood sugar before treatment, and $\mathbf{Y}$ that of the same patients after treatment, $\Delta$ measures the effect of the treatment. The hypotheses then mean $\begin{align*} &H_{0}:\Delta=0, \text{ i.e. the treatment has no effect};\\ &H_{1}: \Delta > 0, \text{ i.e. the treatment decreases blood sugar}. \end{align*}$ ## Point and Interval Estimates Instead of a test, suppose we want to estimate $\Delta$ in the rank-based tests, e.g. the shift coefficient in Wilcoxson rank sum test, or the location of the signed-rank test. - We may want a point estimate $\hat{\Delta}$, - Or a confidence interval of $\Delta$ of level $\alpha$; we will follow the idea in [[Confidence Intervals, Tests, P-Values]] to derive an interval from a test, namely $\{ \Delta_{0} ~|~ H_{0}(\Delta_{0}) \text{ is not rejected}\},$where $H_{0}(W_{0})$ is short hand for "$H_{0}:\Delta=\Delta_{0}quot;. ### Estimation with Hodges-Lehmann Since we are not making distribution assumptions (for the data $X,Y$), we want to use [[Hodges-Lehmann Estimators]], which do not require such assumptions: ![[Hodges-Lehmann Estimators#^dfb102]] Applying it to the two-tailed rank-sum test estimates the shift coefficient to be $\tilde{\Delta}_{\text{rank sum shift}}=\mathrm{median}\{ Y_{j}-X_{i} \,|\, i=1,\dots,m;\, j=1,\dots ,n\}.$ > [!proof]- > Since the Wilcoxson statistic and the Mann-Whitney have a constant difference, they must produce the same p-value hence the same estimator. > Therefore we opt to work with the MW statistic. Its p-value is $\mathbb{P}[| \mathrm{MW}-\mu_{n,m} | \ge | \mathrm{MW}_{\mathrm{obs}}-\mu_{n,m} |],$where $\mu_{n,m}=mn / 2$ is its expectation. To maximize this, the symmetric distribution of $\mathrm{MW}$ requires $\hat{\Delta}=\underset{\Delta}{\arg\min}~| \mathrm{MW}(\Delta)-\frac{mn}{2} |,$where $\mathrm{MW}(\Delta)$ is the Mann-Whitney statistic on the sample $(\mathbf{X}, \mathbf{Y}-\Delta)$. > > Recall the definition of $\mathrm{MW}$ is the number of $(Y_{j}-\Delta,X_{i})$ pairs where $Y_{j}-\Delta>X_{i}$, of which there are $mn$ pairs. Therefore, we look for the $\Delta$ that has half of those pairs positive, which is $\Delta = \mathrm{median}(Y_{j}-X_{i})_{i,j}$. For the signed rank statistic $W_{\mathrm{SR}}$, its Hodges-Lehmann estimate of the median is $\tilde{\Delta}_{\text{signed rank median}}=\mathrm{median} \left\{ \frac{X_{i}+X_{j}}{2} \,|\, 1 \le i \le j \le N \right\}.$ ### CI of Shifts in Rank Sum Tests For the rank sum test, we use the Mann-Whitney version $\mathrm{MW}_{\Delta_{0}}:= | \{ (i,j) ~:~ Y_{j}-\Delta_{0}>X_{i}\} |$, and the test rejects $H_{0}(\Delta_{0})$ iff $\mathrm{MW}_{\Delta_{0}} \not\in(w_{\alpha / 2}, w_{1-\alpha / 2}),$where $w_{\alpha/2}$ and $w_{1-\alpha / 2}$ are quantiles of the distribution of $\mathrm{MW}_{\Delta_{0}} ~|~ H_{0}$. *By symmetry, of the distribution of $\mathrm{MW}$, $w_{1-\alpha / 2}=mn-w_{\alpha / 2}$.* > [!warning]- Discrete distributions > Due to discreteness of $\mathrm{MW}$, the "quantiles" may need to be "on the safer side": $\begin{align*} \mathbb{P}[\mathrm{MW}_{\Delta_{0}} &\le w_{\alpha / 2} ~|~ H_{0}] \le \alpha / 2,\\ \mathbb{P}[\mathrm{MW}_{\Delta_{0}} &\ge w_{1-\alpha / 2} ~|~ H_{0}] \le \alpha / 2. \end{align*}$This can be done by choosing $\begin{align*} w_{\alpha / 2}&:= \sup \{ w ~:~ \mathbb{P}[\mathrm{MW}_{\Delta_{0}} \le w ~|~ H_{0}] \le \alpha / 2\},\\[0.4em] w_{1-\alpha / 2}&:= \inf \{ w ~:~ \mathbb{P}[\mathrm{MW}_{\Delta_{0}} \ge w ~|~ H_{0}] \le \alpha / 2\}\\ &~=1+\sup \{ w ~:~ \mathbb{P}[\mathrm{MW}_{\Delta_{0}} \le w ~|~ H_{0}] \lneq 1-\alpha / 2 \}. \end{align*}$ Therefore, $H_{0}(\Delta_{0})$ is rejected if and only if the number of $\{ (i,j) ~|~ Y_{j}-X_{i}<\Delta_{0}\}$ is too small ($\le w_{\alpha / 2}$) or too large $(\ge mn -w_{\alpha / 2})$. Therefore, *the confidence interval is exactly decided by the order statistics of sorted differences*: let $\mathbf{D} \in \mathbb{R}^{mn}$ be the paired differences and $D_{(1)},\dots,D_{(mn)}$ their rank statistics. To fall within the confidence interval, $\Delta_{0}$ needs to be strictly greater than at least $w_{\alpha / 2} + 1$ of $D_{(k)}$, but strictly fewer than $mn - w_{\alpha / 2}$ of them. That is, the confidence interval is $(D_{(w_{\alpha / 2} +1)}, D_{(mn - w_{\alpha /2})}).$Schematically, we find the differences and sort them: $\underset{\xrightarrow[\substack{\text{skip }w_{\alpha / 2} \\ \text{ entries}}]{}\hphantom{}}{D_{(1)},\dots, D_{(w_{\alpha / 2})}}, \underbrace{(D_{(w_{\alpha / 2}+1)},\dots,D_{(mn - w_{\alpha / 2})})}_{\substack{\text{confidence interval} \\ \text{of }\Delta}}, \underset{\xleftarrow[\substack{\text{skip } w_{\alpha / 2} \\ \text{entries}}]{}\hphantom{}}{\dots, D_{(mn)}}.$ > [!Help]- Off by one? > Maybe you got the definition of $\mathrm{MW}(\Delta)$ wrong: $\Delta$ needs to be strictly greater than a difference $D_{ij}=Y_{j}-X_{i}$ for it to be counted in $\mathrm{MW}(\Delta)$. > > Another possibility is that you forgot that $\Delta=w_{\alpha / 2},mn-w_{\alpha / 2}$ will be rejected: it is not good enough to barely tough the already relaxed CI! ### CI of Location in Signed Rank Tests Similarly, since $W_{\mathrm{SR}}(\mu)$ counts the number of Walsh averages $\mathbf{A}=\left( \frac{X_{i}+X_{j}}{2} \right)_{1 \le i \le j \le N}$ that are strictly greater than $\mu$, the same reasoning applies. - For succinctness, write $N:= n(n+1) / 2$ to be the number of Walsh averages. Again, let $w_{\alpha / 2}, w_{1-\alpha / 2} =N-w_{\alpha / 2}$ be the "on the safe side" quantiles of $W_{\mathrm{SR}}(\mu_{0}) ~|~ H_{0}(\mu_{0})$, we see that a location $\mu$ is rejected if and only if $W_{\mathrm{SR}}(\mu)$ is either: - Strictly less than too few ($\le w_{\alpha / 2}$) Walsh averages; - Strictly less than too many ($\ge N-w_{\alpha / 2}$) Walsh averages. Therefore, sort the Walsh averages to find the CI: $(A_{(w_{\alpha / 2}+1)}, A_{(N - w_{\alpha / 2})}).$ Schematically, $\underset{\xrightarrow[\substack{\text{count }w_{\alpha / 2} \\ \text{ entries}}]{}\hphantom{}}{A_{(1)}\dots, {A_{(w_{\alpha / 2})}}}, \underbrace{A_{(w_{\alpha / 2}+1)},\dots, A_{(N - w_{\alpha / 2})}}_{\substack{\text{confidence interval} \\ \text{of }\mu}}, \underset{\xleftarrow[\substack{\text{count } w_{\alpha / 2} \\ \text{entries}}]{}\hphantom{}}{A_{(w_{1-\alpha / 2})}\dots, A_{(N)}}$ > [!help]- Off by one AGAIN? > Again, recall that $W_{\mathrm{SR}}$ counts the number of Walsh averages *strictly* greater than $\mu$, so to have $W_{\mathrm{SR}}(\mu)$ greater than $w_{\alpha / 2}$, say, $\mu$ needs more than $w_{\alpha / 2}$ Walsh averages above it, i.e. $\mu$ is strictly less than $A_{(N / 2 - w_{\alpha / 2})}$. > > Also, remember that $W_{\mathrm{SR}}=w_{\alpha / 2},N-w_{\alpha / 2}$ are rejected. ## Computation Considerations ### Monte Carlo Simulation When $N$ is large, $| \mathcal{P}_{N} |=N!$ is enormous, so the uniform distribution is usually accessed via simulation methods like [[Monte Carlo Integration|Monte Carlo]]. - For example, the WRST p-value can be computed by drawing ranks $1:N$ and summing the last $m$ values; this is equivalent to sampling $m$ random numbers from $\{ 1,\dots,N \}$. ```python fold title:"Monte Carlo Implmented in Python" from random import sample import numpy as np def sample_ranks_sums(N, m) -> int: possible_ranks = [(i + 1) for i in range(N)] sampled_ranks = sample(possible_ranks, m) return sum(sampled_ranks) def monte_carlo_ranks(N, m, observed, n_iterations = 1000, side = "two_sided"): sampled_ranks = np.array([sample_ranks_sums(N, m) for _ in range(n_iterations)]) p_lower = (sampled_ranks <= observed).mean() p_upper = (sampled_ranks >= observed).mean() match side: case "two_sided": p = min(p_lower, p_upper) * 2 case "left": p = p_lower case "right": p = p_upper case _: raise ValueError return p ``` ### Normal Approximations Alternatively, assuming joint independence between $\mathbf{X}$ and $\mathbf{Y}$ and that $H_{0}$ holds, the rank sum $W$ is approximately normal when $m,n \to \infty$: $W \overset{D}{\approx} N\left( \frac{m(m+n+1)}{2}, \frac{mn(m+n+1)}{12} \right).$Similarly, the signed-rank statistic follows $W_{\mathrm{SR}} \overset{D}{\approx} N\left( \frac{n(n+1)}{4}, \frac{n(n+1)(2n+1)}{24} \right).$ This provides a way to compute a deterministic approximation of the p-value without sampling or heavy computations. However, instead of using `pnorm(observed, expected, sd)`, calling `pnorm(observed ± 0.5, expected, sd)` gives a better approximation. - This is called the **continuity correction**, which is caused by the discrete $\mathbb{P}[W=w]$ corresponding to a range of values (e.g. $\mathbb{P}[N(0,1) \in (w \pm 0.5)]$) in its continuous approximation. - The sign of the $\pm{0}.5$ depends on the tail being computed: if we compute $\mathbb{P}[W \le w]$, then it is $\Phi(w+0.5)$, and for $\mathbb{P}[W \ge w]=1-\mathbb{P}[W<w]$, we use $1-\Phi(w-0.5)$. ### Nominal vs. Actual Levels In forming a test or a CI, we want to find the quantiles $w_{\alpha}$ where $\mathbb{P}[W(\mathbf{R}) < w _{\alpha} \,|\, H_{0}]\overset{?}{=}\alpha.$However, the discrete distribution of $W(\mathbf{R})$ makes exact equality with the **nominal level** $\alpha$ impossible, so we settle for $w_{\alpha}$ where $w_{\alpha}:= \max_{w} \Big\{ \mathbb{P}[W(\mathbf{R}) < w \,|\, H_{0}]\,\le\,\alpha\Big\}.$With this quantile, the actual level of the test/CI is $\mathbb{P}[W(\mathbf{R}) < w_{alpha} \,|\, H_{0}]$, which is less (hence stronger than) $\alpha$. ### Ties With ties in the data (especially ones measured with some granularity, e.g. lifespan measured in days), the ranks are no longer $U(\mathcal{P}_{N})$ even if the distribution is exchangeable. Computing p-values is still possible with Monte-Carlo by *directly permuting the original data*: with $\sigma$ being some permutation, it computes $R(\sigma\mathbf{Z})$, instead of $\sigma R(\mathbf{Z})$. - So for example, instead of `sample(ranks, m)`, the simulation will call `shuffle(data)` then `ranks = rank(data)`, lastly summing the last $m$ entries of `ranks`.