LU Factorization - Random Notes Go Brrrrrrr

> [!tldr] LU Factorization > > The LU factorizes a square matrix into an upper- ($U$) and a lower-triangular ($L$) matrix. > > Gaussian Elimination turns the matrix into $U$, and those operations are represented by $L$. Assuming $a_{11} \ne 0$ (we will deal with that later), and taking $L_{1}:= \begin{bmatrix} 1 &&&&\\ a_{21} / a_{11} \\ \vdots \\ a_{m1} / a_{11} \end{bmatrix}, ~~~U_{1}:= \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ \\ \\ \vphantom{\vdots} \\ \end{bmatrix},$we extract the first row and column of $A$: $A=L_{1}U_{1}+\begin{bmatrix} \\ & * & * & * & * \\ & * & * & * & *\\ & * & * & * & *\\ & * & * & * & * \end{bmatrix},$and inducting on the size of the matrix, we further decompose the lower-right block and pad the decomposition with zeros to get $A=\underbrace{\begin{bmatrix}* \\ * \\ * \\ * \end{bmatrix}\begin{bmatrix}*& * & * & *\end{bmatrix}}_{L_{1}U_{1}} +\underbrace{\begin{bmatrix}0 \\ * \\ * \\ * \end{bmatrix}\begin{bmatrix}0& * & * & *\end{bmatrix}}_{L_{2}U_{2}}+\dots, $giving the decomposition $A=[L_{1},\dots,L_{n}] \begin{bmatrix} U_{1} \\ \vdots \\ U_{n}\end{bmatrix}=: LU.$ ### LU from Gaussian Elimination Given a linear system $Ax=b$, where $A \in \mathbb{R}^{n\times n},\, b \in \mathbb{R}^{n \times 1}$, we can eliminate the entries under the diagonal via **elementary row operations (ERO)**. In particular, in eliminating entries under the $a_{ii}$ element, it is subtracting multiples of the $i$th row from the rows beneath it. $A \xrightarrow{\text{ERO}}\left[\begin{array}{c|c} \times & \times \dots \times \\ & \times \dots \times \\ & \times \dots \times \\ & \times \dots \times \end{array}\right] \xrightarrow{\text{ERO}} \left[\begin{array}{c|c} \times \,\, \times & \times \dots \times \\ \,\,\,\,\,\,\,\times & \times \dots \times \\ & \times \dots \times \\ & \times \dots \times \end{array}\right]\xrightarrow{\text{ERO}}\cdots$This process is called **Gaussian Elimination**. These EROs can be represented by pre-multiplication with lower triangular matrices: subtracting $(\lambda \cdot \mathrm{row}_{i})$ from $\mathrm{row}_{j}$ corresponds to the matrix $M(i,j,\lambda)=I_{n\times n}-\lambda(\delta_{ij})_{n \times n}\in (\llcorner)$i.e. the identity matrix, except with a $(-\lambda)$ in the $(i,j)$ entry. It's lower-triangular because $i > j$ in Gaussian elimination. Reducing the whole $i$th column under the diagonal is then represented by the matrix $L_{i} \equiv\prod^{n}_{j=i+1}M\left( i,j, \frac{a_{ij}}{a_{ii}} \right)=\begin{bmatrix} 1 \\ & \ddots \\ & & 1 \\ & & - \frac{a_{(i+1)i}}{a_{ii}} & 1 \\ & & \vdots & & & \ddots \\ & & - \frac{a_{ni}}{a_{ii}} & & & & 1 \end{bmatrix}$ Doing so for each column reduces the entire matrix $A$, as represented by the matrix product $L_{n}\cdots L_{1}A=U$Since the product $L_{n}\cdots L_{1} \in (\llcorner)$ is invertible (diagonal elements are all $1$), we take the inverse $L=(L_{n}\cdots L_{1})^{-1}$, giving the LU factorization of $A$: $A=LU=(\llcorner)(\urcorner)$ ### Pivoting > [!bigidea] > (Partial) pivoting improves the stability of vanilla GE by permuting the rows of $A$ with permutation matrix $P$. The factorization is $LU=PA$. In Gaussian elimination, a **pivot** is the entry used to eliminate the entries below it; in vanilla GE, they are the diagonal entries $a_{ii}$. Ideally the pivots should be large, so naively using the diagonal elements $a_{ii}$ as pivots cause issues. * When $a_{ii} = 0$, the quotients $-a_{ij} / a_{ii}$ are not well-defined. * If $|a_{ii}| \ne 0$ are tiny, the quotients would be huge, making the computation unstable. **Pivoting** is a solution to the issue: when eliminating the $i$th column, it looks for a larger entry in the "remainder" of the matrix, and swaps it with $a_{ii}$ by exchanging their rows and/or columns: $\left[\begin{array}{} \times & \times & \begin{matrix}\times & \dots & \dots & \times\end{matrix} \\ & \times & \begin{matrix} \times & \dots & \dots & \times \end{matrix} \\ & & \underset{\text{search here}}{\boxed{\begin{matrix} a_{ii} & \dots & \dots \\ \dots & \dots & \dots \\ \dots & a_{**} & \dots \\ \end{matrix}}} \\ \end{array}\right] \xrightarrow{\text{pivot}} \left[\begin{array}{} \times & \times & \begin{matrix}\times & \dots & \dots & \times\end{matrix} \\ & \times & \begin{matrix} \times & \dots & \dots & \times \end{matrix} \\ & & \boxed{\begin{matrix} a_{**} & \dots & \dots \\ \dots & \dots & \dots \\ \dots & a_{ii} & \dots \\ \end{matrix}} \\ \end{array}\right]$ Ideally, pivoting should get the largest element in the entire "remainder", but searching over a $(n-i)\times(n-i)$ matrix means that *full pivoting is computationally expensive*. **Partial pivoting** is a compromise by restricting pivoting to elements in the same column: if $a_{i*}=\max_{j}\{ a_{ij} \}$, partial pivot exchanges the rows to get $\left[\begin{array}{} \times & \times & \times & \begin{matrix} \dots & \times\end{matrix} \\ & \times & \times & \begin{matrix} \dots & \times \end{matrix} \\ & & \underset{\text{here}}{\boxed{\begin{matrix} a_{ii} \\ \dots \\ a_{i*} \\ \end{matrix}}} & \begin{matrix} \dots & \times \\ \dots & \times \\ \dots & \times \\ \end{matrix} \end{array}\right] \xrightarrow{\text{partial pivot}} \left[\begin{array}{} \times & \times & \times & \begin{matrix} \dots & \times\end{matrix} \\ & \times & \times & \begin{matrix} \dots & \times \end{matrix} \\ & & \boxed{\begin{matrix} a_{i*} \\ \dots \\ a_{ii} \\ \end{matrix}} & \begin{matrix} \dots & \times \\ \dots & \times \\ \dots & \times \\ \end{matrix} \end{array}\right]$ In matrix form, we will obtain a permutation matrix $P$, for which we compute the LU factorization $PA=LU.$ It's easy to show that for a non-singular $A$, we can always find $P$ via partial pivoting: if for contradiction when eliminating the $i$th column we encounter $a_{ji}=0$ for all $j \ge i$, making pivoting impossible, we can still choose $L_{i}=\begin{bmatrix} & 0\\ & \vdots\\ \hphantom{\dots} & 1 &&&&&&\\ & \vdots \\ & 0 \\ \end{bmatrix}, ~~~ U_{i}=\begin{bmatrix} \\ \\ \dots & 0& a_{i,i+1} & \dots & a_{i,n} \\ \\ \\ \end{bmatrix},$and keep the upper/lower triangular forms. In the end we will arrive at a singular $U$ (there is a $0$ on its diagonal), contradictory to $A$ non-singular. With partial pivoting, each step of GE involves (1) an exchange of rows, and (2) elimination: $ \underset{A}{\left[ \begin{array}{} {\, \boxed{\begin{matrix}a_{ii} \\\dots \\ a_{ik}\\\end{matrix}}} & \begin{matrix} \dots & \dots\,\, \\ \dots & \dots\,\, \\ \dots & \dots\,\, \end{matrix} \end{array} \right]} \xrightarrow[P_{i}]{(1)\text{ pivot}} \underset{P_{i}A}{\left[ \begin{array}{} {\, \boxed{\begin{matrix}a_{ik} \\\dots \\ a_{ii}\\\end{matrix}}} & \begin{matrix} \dots & \dots\,\, \\ \dots & \dots\,\, \\ \dots & \dots\,\, \end{matrix} \end{array} \right]} \xrightarrow[L_{i}]{(2)\text{ GE}} \underset{L_{i}P_{i}A}{\begin{bmatrix} a_{ik} & \dots & \dots \\ 0 & \dots & \dots \\ 0 & \dots & \dots \end{bmatrix}}$or as matrix multiplications, $A \mapsto L_{i}P_{i}A$, where $P_{i}$ is the permutation matrix that performs the partial pivoting (i.e. exchanging the $i$th and $k$th rows). The entire elimination process is then $L_{n}P_{n}\cdots L_{1}P_{1}A = U$but note that *commuting a lower-triangular matrix with a permutation gives another lower triangular matrix*: $PL=L'P,\,\,\text{ where }L'=PLP^{-1} \in (\llcorner)$so moving all the permutation matrices to the left of $L_{k}$, $\underbrace{(L'_{n}\cdots L'_{1})}_{L ^{-1}}\underbrace{P_{n}\cdots P_{1}}_{P }A=U$and inverting $L^{-1}$ gives $LU=PA$the LU factorization of $PA$. ### Application to Linear Systems To solve $Ax=b$, we can decompose $A=LU$ in $O(n^{3})$ flops to get $\begin{align*} LUx &= b, \end{align*}$which we can solve by (1) solving $Ly=b$, and (2) $Ux=y$. Since both steps are triangular, they are cheap to solve (in $O(n^{2})$ flops).