False Discovery Rate Control - Random Notes Go Brrrrrrr

Suppose we are testing hypotheses $\{ (H_{0i},H_{1i}) \}_{i=1}^{n}$, with some procedure $\mathcal{D}: \mathcal{X} \mapsto \{ 0,1 \}^{n}$. We have the two-way table ![[HypothesisTestingTwoWayTable.png#invert|center|w80]] - In this table, the [[FWER Control|family-wise error rate]] is $\mathrm{FWER}=\mathbb{P}[a > 0]$. > [!definition|*] False Discovery Rate > The **false discovery rate (FDR)** of the procedure $\mathcal{D}(\mathbf{x})$ is $\mathrm{FDR}:= \frac{a}{R}=\frac{\sum_{i \in I_{0}} \mathcal{D}(\mathbf{x})_{i}}{\sum_{i \in I} \mathcal{D}(\mathbf{x})_{i}},$where $I_{0}$ are the indices of actual nulls, and $I=\{ 1,\dots,n \}$ index the entire set of hypotheses. A series of testing procedures aim at limiting the FDR. Compared to controlling FWER, FDR is more lenient on mistakes -- we allow a proportion $\alpha$ of decisions to be incorrect, instead of a chance of $\alpha$ to have any incorrectness: $\begin{align*} \mathrm{FWER}&: \mathbb{P}\left[ \sum_{i \in I} \mathcal{D}(\mathbf{x})_{i} >{0} ~|~ H_{0}\right],\\ \mathrm{FDR}&: \mathbb{E}\left[ \sum_{i \in I_{0}} \mathcal{D}(\mathbf{x})_{i} \right] = \sum_{i \in I_{0}}\mathbb{P}[\mathcal{D}(\mathbf{x})_{i} =1] \ge \mathbb{P}\left[ \sum_{i \in I_{0}} \mathcal{D}(\mathbf{x})_{i} >0 \right] \end{align*}$where the FDR (1) examines a smaller set of hypotheses (only the actually null ones in $I_{0}$). ### Benjamini and Hochberg's Algorithm Similar to [[FWER Control#^cc1173|Holm's procedure]], Bejamini and Hochberg's algorithm (BH) controls FDR using a step-up algorithm on sorted p-values of the hypotheses. > [!algorithm] Benjamini-Hochberg Algorithm > Fix $q$ to be the desired limit on the FDR. > > $[1]$ Sort the p-values into $p_{(1)}<\cdots < p_{(n)}.$ > > $[2]$ For $i=1,\dots ,n$, conduct step-up algorithm: > - Compare $p_{(i)}$ with the BH bound of $qi / n$. > - If $p(i)< qi / n$, reject $H_{0i}$ and continue. > - Otherwise, terminate the test without rejecting the current $H_{i0}$ or any that comes after it. The BH algorithm controls the FDR to be $q\cdot n_{0} / n<q$, if the p-values are independent. - This is usually a bad assumption. The bound needs to be tightened for BH to be applicable to non-independent tests. Plotting the rejection boundary against the index of the sorted p-values, we can compare Hochberg's stepdown procedure (dashed, controls FWER) and the BH algorithm (solid, controls FDR): ![[BHFDRHochbergFWER.png#invert|center|w60]] We can see that the BH algorithm rejects $H_{0 i}$ a lot more liberally than FWER-controlling procedures. ### Controlling Local False Discovery Rate $\mathrm{fdr}$ The **local false discovery rate $\mathrm{fdr}$** is defined as $\mathrm{fdr}(z):=\mathbb{P}[\mathrm{null} ~|~z] =\frac{\pi_{0}f_{0}(z)}{f(z)}.$We may want to reject observations $i$ with small $\mathrm{fdr}(z_{i})$. - Note that this is not $\mathbb{P}[\text{Case } i \text{ is null} ~|~ \mathbf{z}]$, which may be more desirable for [[Learning from Others|incorporating information from other observations]]. If we impose a threshold $q$ (usually $q=0.2$) to get the rejection region $\mathcal{R}(q):= \{ z~|~\mathrm{fdr}(z) < q \},$ simplifying the boundary gives a bound on the [[Likelihood Ratios|likelihood ratio]]: $\mathrm{fdr}(z)< q \iff \frac{f_{0}(z)}{f_{1}(z)} < \frac{\pi_{1}q}{\pi_{0}(1-q)},$which is approximately $1/36$ for $q=0.2,\pi_{0}=0.9$. In practice, the $\mathrm{fdr}$ needs to be estimated. Supposing $f_{0}$ is known, we need estimates $\hat{\pi}_{0}$ and $\hat{f}$ to get the **$\mathrm{fdr}$ estimate** $\widehat{\mathrm{fdr}}(z):= \frac{\hat{\pi}_{0}f_{0}(z)}{\hat{f}(z)}.$ - $\hat{\pi}_{0}$ can be an arbitrarily chosen value (e.g. 0.95) or some [[Empirical Bayes Hypothesis Testing#Estimation of $ hat{ pi}_{0}$|empirical Bayes estimate]]. - $\hat{f}$ can be obtained by some [[Density Estimation]]. The benefits of using $\mathrm{fdr}$ over using $\mathrm{Fdr}$ is: - While we can reject a set of hypotheses $\{ i ~|~z_{i} \in \mathcal{R}\}$, it doesn't mean that they are equally significant. We can quantify this significance using $\mathrm{fdr}s local nature. ### General Prior Distributions Instead of assuming $H_{0i}:Z_{i} \sim f_{0}$ (similarly for $H_{1i},f_{1}$) where all null distributions are the same, we can generalize the setup to $\begin{align*} H_{0i}: Z_{i} \sim f_{0i}, && \mathbb{P}[H_{0i}]=\pi_{0i},\\ H_{1i}: Z_{i} \sim f_{1i}, && \mathbb{P}[H_{1i}]=\pi_{1i}.\end{align*}$where both the null probability $\mathbb{P}[H_{0i}]$ and the distribution $f_{0i}$ of $Z_{i} ~|~ H_{0i}$ are allowed to vary. In this case, we can still work with the **marginal null distribution** $f_{0}:= \frac{\sum_{i}\pi_{0i}f_{0i}}{\sum_{i}\pi_{0i}}.$ As long as our assumptions about $f_{0}$, $\pi_{0}$, etc. hold for those marginals, the control $\mathrm{fdr}$ control still holds for the average case. Otherwise, we need to form estimates of each $f_{0i}$. Suppose the $N$ cases can be partitioned groups (indexed by $m=1,\dots,M$). To avoid unstable estimates, we can enhance the semi-parametric [[Density Estimation|Lindsey's method]] with group-specific effects as: $\log \mathbb{E}[f_{m}(z)]=\underbrace{\sum_{j=1}^{J}\beta_{j}b_{j}(z)}_{\text{shared effects}}+\underbrace{\vphantom{\sum_{j=1}^{J}}\beta_{1m}z+\beta_{2m}z^{2}}_{\text{group-specific}},$where of course the group-specific effects can be fitted using other bases, and are subject to identifiability constraints.