> [!tldr] Pearson's R2
> In a regression setting where the response is $\mathbf{y}$ and predictions are $\hat{\mathbf{y}}$, the **Pearson's $R^{2}$** (or multiple regression coefficient) is defined to be $R^{2}:= \frac{\mathrm{ESS}}{\mathrm{TSS}}=1-\frac{\mathrm{RSS}}{\mathrm{TSS}}=\frac{\| \hat{\mathbf{y}}-\bar{y} \|^{2} }{\| \mathbf{y}-\bar{y} \|^{2} },$where $\mathrm{ESS}:= \| \hat{\mathbf{y}}-\bar{y} \|^{2}$ is the explained sum of squares.
>
- In OLS $\mathbf{y} \sim \beta_{0}+\mathbf{X}\beta$ (assuming $\mathbf{X}$ does not contain an intercept-column), this is closely related to the [[Inference in OLS#The F-Statistic|F-statistic]] for testing $H_{0}:\beta=\mathbf{0}$ against $H_{1}:\lnot H_{0}$.
- This is also related to the [[Likelihood Ratios|likelihood ratio]] of the above hypotheses under the Gauss-Markov Model.
- In the one-predictor model $Y\sim 1+X$, it is just $\hat{\rho}_{XY}^{2}$.
Alternatively, it measures the empirical correlation between $\mathbf{y}$ and $\hat{\mathbf{y}}$:
> [!theorem|*] $R^2$ as correlation coefficient
> Pearson's $R^{2}$ equals the square of the empirical correlation coefficient between $\mathbf{y}$ and $\hat{\mathbf{y}}$:
> $R^{2}=\Bigg( \underbrace{ \frac{\left< \mathbf{y}-\bar{y},\hat{\mathbf{y}}-\bar{y} \right> }{\| \mathbf{y}-\bar{y} \|\cdot \| \hat{\mathbf{y}}-\bar{y} \| } \vphantom{\frac{1}{\frac{2}{\frac{3}{4}}}}}_{=: \hat{\rho}_{y\hat{y}}}\Bigg) ^{2}.$
>
> > [!proof]-
> > The numerator of $\hat{\rho}_{y\hat{y}}$ can be written as $\begin{align*}
> \left< \mathbf{y}-\bar{y} ,\hat{\mathbf{y}}-\bar{y}\right> &= -\frac{1}{2}\left[\| (\mathbf{y}-\bar{y})-(\hat{\mathbf{y}}-\bar{y}) \|^{2} -\| \mathbf{y}-\bar{y} \|^{2}-\| \hat{\mathbf{y}}-\bar{y} \|^{2} \right]\\
> &= \frac{\mathrm{TSS}+\mathrm{ESS}-\mathrm{RSS}}{2}\\
> &= \mathrm{ESS},
> \end{align*}$
> > Now $\mathrm{RHS}$ becomes $\hat{\rho}^{2}_{y\hat{y}}=\mathrm{ESS}^{2} / (\mathrm{TSS} \cdot \mathrm{ESS})=\mathrm{ESS} / \mathrm{TSS}$, which is by definition $\mathrm{LHS}=R^{2}$.
- Therefore, if the prediction is highly correlated with the response it's predicting, it will have a high $R^{2}$, indicating a good fit (at least within the training sample).
### Distribution of the $R^{2}$
> [!theorem|*] Beta Distribution of the $R^2$
> Under the Gauss-Markov Model and the hypothesis $H_{0}:\beta=\mathbf{0}$, (i.e. there is a non-0 intercept, but all other predictors are not related to $Y$), *$R^{2}$ has the [[beta distribution]]* $R^{2}\sim \mathrm{Beta}\left( \frac{p-1}{2},\frac{n-p}{2} \right),$where $p$ is the total number of predictors (including the intercept, so $\mathbf{X}$ has $p-1$ columns).
>
> > [!proof]-
> > Rewrite $R^{2}=\mathrm{ESS} / (\mathrm{ESS}+\mathrm{RSS})$, then [[Inference in OLS#^a5aa2f|independence of RSS increments]] guarantee that $\mathrm{ESS}$ and $\mathrm{RSS}$ are independent $\sigma^{2}\chi^{2}$ distributions of $\mathrm{df}=p-1$ and $n-p$ respectively.
> >
> > Therefore $R^{2}$ can be written as $R^{2}=\frac{\chi^{2}_{p-1}}{\chi^{2}_{p-1}+\chi^{2}_{n-p}},$the two $\chi^{2}