Suppose we are testing hypotheses $\{ (H_{0i},H_{1i}) \}_{i=1}^{n}$, with some procedure $\mathcal{D}: \mathcal{X} \mapsto \{ 0,1 \}^{n}$.
We have the two-way table
![[HypothesisTestingTwoWayTable.png#invert|center|w80]]
- In this table, the [[FWER Control|family-wise error rate]] is $\mathrm{FWER}=\mathbb{P}[a > 0]$.
> [!definition|*] False Discovery Rate
> The **false discovery rate (FDR)** of the procedure $\mathcal{D}(\mathbf{x})$ is $\mathrm{FDR}:= \frac{a}{R}=\frac{\sum_{i \in I_{0}} \mathcal{D}(\mathbf{x})_{i}}{\sum_{i \in I} \mathcal{D}(\mathbf{x})_{i}},$where $I_{0}$ are the indices of actual nulls, and $I=\{ 1,\dots,n \}$ index the entire set of hypotheses.
A series of testing procedures aim at limiting the FDR. Compared to controlling FWER, FDR is more lenient on mistakes -- we allow a proportion $\alpha$ of decisions to be incorrect, instead of a chance of $\alpha$ to have any incorrectness: $\begin{align*}
\mathrm{FWER}&: \mathbb{P}\left[ \sum_{i \in I} \mathcal{D}(\mathbf{x})_{i} >{0} ~|~ H_{0}\right],\\
\mathrm{FDR}&: \mathbb{E}\left[ \sum_{i \in I_{0}} \mathcal{D}(\mathbf{x})_{i} \right] = \sum_{i \in I_{0}}\mathbb{P}[\mathcal{D}(\mathbf{x})_{i} =1] \ge \mathbb{P}\left[ \sum_{i \in I_{0}} \mathcal{D}(\mathbf{x})_{i} >0 \right]
\end{align*}$where the FDR (1) examines a smaller set of hypotheses (only the actually null ones in $I_{0}$).
### Benjamini and Hochberg's Algorithm
Similar to [[FWER Control#^cc1173|Holm's procedure]], Bejamini and Hochberg's algorithm (BH) controls FDR using a step-up algorithm on sorted p-values of the hypotheses.
> [!algorithm] Benjamini-Hochberg Algorithm
> Fix $q$ to be the desired limit on the FDR.
>
> $[1]$ Sort the p-values into $p_{(1)}<\cdots < p_{(n)}.$
>
> $[2]$ For $i=1,\dots ,n$, conduct step-up algorithm:
> - Compare $p_{(i)}$ with the BH bound of $qi / n$.
> - If $p(i)< qi / n$, reject $H_{0i}$ and continue.
> - Otherwise, terminate the test without rejecting the current $H_{i0}$ or any that comes after it.
The BH algorithm controls the FDR to be $q\cdot n_{0} / n<q$, if the p-values are independent.
- This is usually a bad assumption. The bound needs to be tightened for BH to be applicable to non-independent tests.
Plotting the rejection boundary against the index of the sorted p-values, we can compare Hochberg's stepdown procedure (dashed, controls FWER) and the BH algorithm (solid, controls FDR):
![[BHFDRHochbergFWER.png#invert|center|w60]]
We can see that the BH algorithm rejects $H_{0 i}$ a lot more liberally than FWER-controlling procedures.
### Controlling Local False Discovery Rate $\mathrm{fdr}$
The **local false discovery rate $\mathrm{fdr}$** is defined as $\mathrm{fdr}(z):=\mathbb{P}[\mathrm{null} ~|~z] =\frac{\pi_{0}f_{0}(z)}{f(z)}.$We may want to reject observations $i$ with small $\mathrm{fdr}(z_{i})$.
- Note that this is not $\mathbb{P}[\text{Case } i \text{ is null} ~|~ \mathbf{z}]$, which may be more desirable for [[Learning from Others|incorporating information from other observations]].
If we impose a threshold $q$ (usually $q=0.2$) to get the rejection region $\mathcal{R}(q):= \{ z~|~\mathrm{fdr}(z) < q \},$
simplifying the boundary gives a bound on the [[Likelihood Ratios|likelihood ratio]]: $\mathrm{fdr}(z)< q \iff \frac{f_{0}(z)}{f_{1}(z)} < \frac{\pi_{1}q}{\pi_{0}(1-q)},$which is approximately $1/36$ for $q=0.2,\pi_{0}=0.9$.
In practice, the $\mathrm{fdr}$ needs to be estimated. Supposing $f_{0}$ is known, we need estimates $\hat{\pi}_{0}$ and $\hat{f}$ to get the **$\mathrm{fdr}$ estimate** $\widehat{\mathrm{fdr}}(z):= \frac{\hat{\pi}_{0}f_{0}(z)}{\hat{f}(z)}.$
- $\hat{\pi}_{0}$ can be an arbitrarily chosen value (e.g. 0.95) or some [[Empirical Bayes Hypothesis Testing#Estimation of $ hat{ pi}_{0}$|empirical Bayes estimate]].
- $\hat{f}$ can be obtained by some [[Density Estimation]].
The benefits of using $\mathrm{fdr}$ over using $\mathrm{Fdr}$ is:
- While we can reject a set of hypotheses $\{ i ~|~z_{i} \in \mathcal{R}\}$, it doesn't mean that they are equally significant. We can quantify this significance using $\mathrm{fdr}