> [!tldr] Pseudoinverse > The **pseudoinverse** of a matrix $A \in \mathbb{R}^{m\times n}$ with rank-$r$ [[Singular Value Decomposition|SVD]] $A=U_{r}D_{r}V_{r}^{T}$ ($r:=\mathrm{rank}A \le \min(m,n)$) is$A^{\dagger}:=V_{r}D^{-1}_{r}U_{r}^{T} \in \mathbb{R}^{n\times m},$where $D_{r}$ is invertible since the first $r$ singular values of $A$ are non-0. Denoting them as $\sigma_{1},\dots,\sigma_{r}$, then $A^{\dagger}=V\begin{bmatrix} \sigma_{1}^{-1} \\ & \ddots \\ & & \sigma^{-1}_{r} \\ & & & & & & \\ \end{bmatrix}_{n\times m}U^{T}.$ Suppose $x=x_{0}+x_{1}$ is a decomposition into $x_{0} \in \mathrm{ker}A$, and $x_{1} \in (\mathrm{ker}A)^\perp$. Then apply $A$ to get $Ax=Ax_{0}+Ax_{1}=Ax_{1}.$ Now $A^{\dagger}$ obviously cannot resurrect $x_{0}$, so instead it maps $A^{\dagger}(Ax)=x_{1}$, i.e. mapping $\mathrm{col}A$back to $(\mathrm{ker}A)^\perp$, and leaves the "dead" $\mathrm{ker}A$ as the zero vector. It has the defining properties of $\begin{align*} A A^{\dagger}A&= I_{m},\\ A^{\dagger}A A^{\dagger}&= I_{n},\\A^{\dagger}A&= I_{n} &\text{if }n<m,\\ A A^{\dagger}&= I_{m} &\text{if }n > m. \end{align*}$and $A^{\dagger}=A^{-1}$ for invertible matrices. Note that the matrix $(X^{T}X)^{-1}X^{T}$ ($X=A$) from [[Linear Regression Methods#Least Squares Linear Regression|OLS]] is another formulation of the pseudoinverse: assuming the data has full rank of $\mathrm{rank}X=n < m:=:p$, $\begin{align*} \textcolor{cyan}{(X^{T}X)^{-1}}X^{T}&= \textcolor{cyan}{(V\hat{D}^{2}V^{T})^{-1}}VD^{T}U^{T}=\textcolor{cyan}{V\hat{D}^{-2}V^{T}}VD^{T}U^{T}\\ &= V\begin{bmatrix} \sigma_{1}^{-1}\\ & \ddots \\ & & \sigma_{n}^{-1} & & & \end{bmatrix}U^{T}\\[0.8em] &= X^{\dagger}. \end{align*}$ Here $\hat{D}$ is the thin SVD matrix of $\mathrm{diag}(\sigma_{1},\dots,\sigma_{n}) \in \mathbb{R}^{n\times n}$.