> [!tldr] Pseudoinverse
> The **pseudoinverse** of a matrix $A \in \mathbb{R}^{m\times n}$ with rank-$r$ [[Singular Value Decomposition|SVD]] $A=U_{r}D_{r}V_{r}^{T}$ ($r:=\mathrm{rank}A \le \min(m,n)$) is$A^{\dagger}:=V_{r}D^{-1}_{r}U_{r}^{T} \in \mathbb{R}^{n\times m},$where $D_{r}$ is invertible since the first $r$ singular values of $A$ are non-0. Denoting them as $\sigma_{1},\dots,\sigma_{r}$, then $A^{\dagger}=V\begin{bmatrix}
\sigma_{1}^{-1} \\ & \ddots \\ & & \sigma^{-1}_{r} \\ & & & & & & \\
\end{bmatrix}_{n\times m}U^{T}.$
Suppose $x=x_{0}+x_{1}$ is a decomposition into $x_{0} \in \mathrm{ker}A$, and $x_{1} \in (\mathrm{ker}A)^\perp$. Then apply $A$ to get $Ax=Ax_{0}+Ax_{1}=Ax_{1}.$
Now $A^{\dagger}$ obviously cannot resurrect $x_{0}$, so instead it maps $A^{\dagger}(Ax)=x_{1}$, i.e. mapping $\mathrm{col}A$back to $(\mathrm{ker}A)^\perp$, and leaves the "dead" $\mathrm{ker}A$ as the zero vector.
It has the defining properties of $\begin{align*}
A A^{\dagger}A&= I_{m},\\ A^{\dagger}A A^{\dagger}&= I_{n},\\A^{\dagger}A&= I_{n} &\text{if }n<m,\\
A A^{\dagger}&= I_{m} &\text{if }n > m.
\end{align*}$and $A^{\dagger}=A^{-1}$ for invertible matrices.
Note that the matrix $(X^{T}X)^{-1}X^{T}$ ($X=A$) from [[Linear Regression Methods#Least Squares Linear Regression|OLS]] is another formulation of the pseudoinverse: assuming the data has full rank of $\mathrm{rank}X=n < m:=:p$, $\begin{align*}
\textcolor{cyan}{(X^{T}X)^{-1}}X^{T}&= \textcolor{cyan}{(V\hat{D}^{2}V^{T})^{-1}}VD^{T}U^{T}=\textcolor{cyan}{V\hat{D}^{-2}V^{T}}VD^{T}U^{T}\\
&= V\begin{bmatrix}
\sigma_{1}^{-1}\\ & \ddots \\ & & \sigma_{n}^{-1} & & &
\end{bmatrix}U^{T}\\[0.8em]
&= X^{\dagger}.
\end{align*}$
Here $\hat{D}$ is the thin SVD matrix of $\mathrm{diag}(\sigma_{1},\dots,\sigma_{n}) \in \mathbb{R}^{n\times n}$.