QR Factorization - Random Notes Go Brrrrrrr

QR factorization decomposes a matrix $A \in \mathbb{R}^{m \times n},m \ge n$ into the product of an orthogonal/orthonormal matrix $Q$ and an upper-triangular $R$. The **thin/reduced QR factorization** produces a "thin" orthonormal ${Q}\in \mathbb{R}^{m\times n}$ (so ${Q}^{T}{Q}=I_{n}$ but $QQ^{T}\ne I_{m}$), and a "small" $R\in \mathbb{R}^{n\times n}$: $A_{m \times n}=\begin{bmatrix} \Bigg\uparrow & &\Bigg\uparrow \\ q_{1} & \dots & q_{n} \\ \Bigg\downarrow & & \Bigg\downarrow \end{bmatrix}_{m \times n} \begin{bmatrix} r_{11} & \dots & r_{1n}\\ & \ddots & \vdots\\ & & r_{nn} \end{bmatrix}_{n\times n}.$ The **full QR factorization** produces a "full" orthogonal $Q_{F}\in\mathbb{R}^{m\times m}$ and a "large" $R_{F}=\in \mathbb{R}^{m\times n}$: $A_{m \times n}=\left[\begin{array}{c|c} \begin{array}{c c}\Bigg\uparrow & &\Bigg\uparrow \\q_{1} & \dots & q_{n} \\\Bigg\downarrow & & \Bigg\downarrow\end{array} & \begin{array}{c c}\Bigg\uparrow & &\Bigg\uparrow \\ q_{n+1} & \dots & q_{m} \\\Bigg\downarrow & & \Bigg\downarrow\end{array} \end{array}\right] _{m \times m} \begin{bmatrix} r_{11} & \dots & r_{1n}\\ & \ddots & \vdots\\ & & r_{nn} \\ \\ \\ \end{bmatrix}_{m\times n}$where $Q_{F}$ is orthogonal: $Q^{T}_{F}Q_{F}=Q_{F}Q_{F}^{T}=I_{m}$. - We usually denote the full $Q_{F}$ $[Q ~|~ Q_{\perp}]$, since the additional columns $Q_{\perp}$ are orthogonal to the thin ${Q}$; they can be computed as any orthogonal extension to $Q$. We can also write $R_{F}=\begin{bmatrix}R \\ 0\end{bmatrix}$. - Although $Q_{F}$ has additional columns, they only multiply with the additional zero rows in $R_{F}$, so they have no effect on the product. ## Derivation and Existence ### QR with Gram-Schmidt > [!bigidea] > Gram-Schmidt on $A$ makes it orthogonal via multiplication with an upper-triangular matrix. Recall that the Gram-Schmidt orthogonalization process generates an orthonormal set, given a set of independent vectors: $\{ u_{1},\dots,u_{n} \} \text{ independent} \xmapsto{\text{G-S}}\{ v_{1},\dots,v_{n} \}\text{ orthonormal}$ where the two set have the same span. If the $n$ columns of $A$ are linearly independent, then the Gram-Schmidt produces a thin QR factorization. > [!proof] > Apply G-S to $\{ a_{1},\dots,a_{n} \}$, the $n$ columns of $A$, and if they are all independent (when $\mathrm{rank}(A)=n$), we get orthogonal vectors > > $\begin{align*} q_{1}&= {\frac{a_{1}}{r_{11}}},\\ q_{2}&= {\frac{a_{2}-r_{12}q_{1}}{r_{22}}},\\ q_{j}&= {\frac{a_{j}-\sum_{i=1}^{j-1}r_{i j}q_{i}}{r_{j j}}}. \end{align*}$ > > We can solve for the columns of $a$ to get $\begin{array}{l}{{a_{1}=r_{11}q_{1}}}\\ {{a_{2}=r_{12}q_{1}+r_{22}q_{2}}}\\ {{a_{j}=r_{1j}q_{1}+r_{2j}q_{2}+\cdot\cdot\cdot+r_{j j}q_{j}.}}\end{array}$ > In matrix form, this is just > $A=QR,$ > with ${Q}=[q_{1},\dots,q_{n}]$ and ${R}=(r_{ij})$. We assumed non-singularity to derive the thin QR but it's easy to generalize this to singular matrices and full QR decompositions by expanding the basis $\{ q_{1},\dots,q_{\mathrm{rank}(A)} \}$ to $\{ q_{1},\dots,q_{n} \}$: > [!proof] > Let $\mathrm{rank}(A)=k$, then WLOG let $\{ a_{1},\dots,a_{k} \}$ be $k$ linearly independent columns, and QR decompose it to get $\begin{bmatrix} \Bigg\uparrow & &\Bigg\uparrow \\ a_{1} & \dots & a_{k} \\ \Bigg\downarrow & & \Bigg\downarrow \end{bmatrix}_{m \times k}=\begin{bmatrix} \Bigg\uparrow & &\Bigg\uparrow \\ q_{1} & \dots & q_{k} \\ \Bigg\downarrow & & \Bigg\downarrow \end{bmatrix}_{m \times k} \begin{bmatrix} r_{11} & \dots & r_{1k}\\ & \ddots & \vdots\\ & & r_{kk} \end{bmatrix}_{k\times k}$Then the remaining columns $\{ a_{k+1},\dots,a_{n} \} \subset \langle a_{1},\dots, a_{k} \rangle=\langle q_{1},\dots, q_{k} \rangle$ are expressible in terms of $q_{1},\dots,q_{k}$, so arbitrarily extend them to an orthogonal set $q_{1},\dots,q_{n}$: $ \begin{bmatrix} \Bigg\uparrow & &\Bigg\uparrow \\ a_{1} & \dots & a_{n} \\ \Bigg\downarrow & & \Bigg\downarrow \end{bmatrix} =\begin{bmatrix} \Bigg\uparrow & &\Bigg\uparrow \\ q_{1} & \dots & q_{n} \\ \Bigg\downarrow & & \Bigg\downarrow \end{bmatrix}_{m \times n} \begin{bmatrix} r_{11} & \dots & r_{1k} & \dots & \dots & r_{1n} \\ &\ddots & \vdots & \ddots & \ddots & \vdots \\ & & r_{kk} & \dots & \dots & r_{kn} \\ & & & 0 & \dots & 0 \\ & & & & \ddots & \vdots \\ & & & & & 0 \end{bmatrix}_{n\times n}$where lower-right corner of $R$ is empty since $a_{k+1},\dots,a_{n}$ can be expressed without $q_{k+1},\dots,q_{n}$. This the thin QR factorization. Now extending the basis again gives us the full QR. ### QR with Householder Reflections > [!bigidea] > Householder reflections are orthogonal maps that can iteratively make $A$ upper-triangular. We wish to find orthogonal $Q_{1,2,\dots}$ that eliminates elements $A$ under the diagonals, column by column: $A \xrightarrow{Q_{1}}\left[\begin{array}{c|c} \times & \times \dots \times \\ & \times \dots \times \\ & \times \dots \times \\ & \times \dots \times \end{array}\right] \xrightarrow{Q_{2}} \left[\begin{array}{c|c} \times \,\, \times & \times \dots \times \\ \,\,\,\,\,\,\,\times & \times \dots \times \\ & \times \dots \times \\ & \times \dots \times \end{array}\right]\xrightarrow{Q_{3}}\cdots $ Note that since $Q_{i}$ are orthogonal, each column must retain its norm, and in particular the first column undergoes $a_{1} \mapsto (\| a_{1} \|,0,\dots,0)^{T}$. The matrix that reflects vectors across a plane orthogonal to $v$ is the **Householder matrix/reflector**, given by $H_{v}=I-2vv^{T},$where $v$ is a unit-length vector controlling the direction being flipped. It maps $u \mapsto (u-2 \times \text{its projection in direction of }v)$. ![[Householder.png#invert|w60|center]] - Note that for a vector $u \in \mathbb{R}^{m}$, $H_{v}u=u-2v(v^{T}u)$ can be computed in $O(m)$ time rather than $O(m^{2})$. Alternatively, given a vector $u$ and the destination $w$ we want to map it to (where $\| u \|=\| w \|$), we can solve for the $v$ such that $H_{v}u=w$: $v=\frac{u-w}{\| u-w \| }.$Therefore, for any $u \in \mathbb{R}^{n}$, we can find a Householder matrix $H_{v}$ that maps $u$ the destination $w:= (\| u \|,0,\dots,0)^{T}$, exactly what we need (at least for the first column $a_{1}$). Recall that $Q_{1}$ needs to map $a_{1}$ to $(\| a_{1} \|,0,\dots,0)^{T}$. Choosing $v_{1} \propto a_{1}-(\|a_{1}\|,0,\dots,0)^{T}$ gives the Householder matrix $Q_{1}:=H_{v_{1}}$ such that$H_{v_{1}}a_{1}=\|a_{1}\|e_{1}=(\|a_{1}\|,0,\dots,0)^{T}$ We need more Householder reflectors $Q_{j}:= H_{v_{j}}$ to map $A$ like $A \overset{Q_{1}}{\mapsto}\begin{bmatrix} \times & \times & \times & \times \\ & \times & \times & \times \\ & \times & \times & \times \\ &\times&\times & \times \\ \end{bmatrix}\overset{Q_{2}}{\mapsto}\begin{bmatrix} \times & \times & \times & \times \\ & \times & \times & \times \\ & & \times & \times \\ &&\times&\times \\ \end{bmatrix}\overset{Q_{3}}{\mapsto}\begin{bmatrix} \times & \times & \times & \times \\ & \times & \times & \times \\ & & \times & \times \\ &&&\times \\ \end{bmatrix}\mapsto\cdots$ We have cleared the first column with $Q_{1}$, and we don't want $Q_{2},\dots$ to destroy the progress -- therefore, $v_{2},\dots$ must have $0$ in the first entry (i.e. their reflectors leave the first coordinate alone). Induction shows that $v_{j}$ has the form $v_{j}= ( \underbrace{0,\dots 0 }_{j-1 \text{ zeros}},\underbrace{ \times, \dots, \times}_{=: \tilde{v}_{j}}).$ In particular, we choose $\tilde{v}_{j}$ to clear the subdiagonal part of the $j$th column of $Q_{j-1}\dots Q_{1}A$. Induct on the size of $A \in \mathbb{R}^{m \times n}$, assuming that any matrix $B \in \mathbb{R}^{(m-1)\times(n-1)}$ has a QR factorization with Householder reflectors $\tilde{Q}_{2},\dots,\tilde{Q}_{n}$ such that $\tilde{Q}_{n}\cdots \tilde{Q}_{2}B=\tilde{R} \in (\urcorner),$and we have $\tilde{Q}:= \tilde{Q}_{n}^{T}\cdots \tilde{Q}_{2}^{T}$ such that $B=\tilde{Q}\tilde{R}$. - As the base case, we have $A \in \mathbb{R}^{(m-n+1)\times 1}$, where one Householder reflection does the job. Then reduce the first column $a_{1}$ with $Q_{1}:=H_{v_{1}}:a_{1}\mapsto(\|a_{1}\|,0,\dots,0)^{T}$, giving $Q_{1}A=\left[　\begin{array}{c|c} \|a_{1}\| & \times\dots \times \\ \hline \begin{array}{}0\\ \vdots\\ 0\end{array} & B \end{array}　\right]$ Then to use $B=\tilde{Q}\tilde{R}$, we pad $\tilde{Q}$ with a row and column to get $\begin{align*} \underbrace{\begin{bmatrix}1 & \\& \tilde{Q}^{T}\end{bmatrix}Q_{1}}_{=:Q^{T}}&A= \begin{bmatrix} 1 & \\& \tilde{Q}^{T}\end{bmatrix}\begin{bmatrix} \|a_{1}\| & \times \\& B\end{bmatrix}= \underbrace{\begin{bmatrix} \|a_{1}\| & \times \\ & \tilde{R} \end{bmatrix}}_{=:R}\\ &\Rightarrow A=QR \end{align*}$which gives the QR factorization of $A$ and completes the induction. To obtain the successive Householder reflections, we can pad $\tilde{Q}_{i}$ like what we did to $\tilde{Q}$ to get $Q_{i}:= \begin{bmatrix}1 & \\ & \tilde{Q}_{i}\end{bmatrix}$, so that $Q_{n}\cdots Q_{1}A=R.$ Note that this produces the full QR directly, as each $Q_{i} \in \mathbb{R}^{m\times m}$ produces a $Q:Q_{1}^{T}\dots Q_{n}^{T} \in \mathbb{R}^{m\times m}$ too, and $R$ contains a number of zero rows (corresponding to the zeros produced in the base case $A \in \mathbb{R}^{(m-n+1)\times1}$). ### QR on Hessenberg Matrices > [!bigidea] > Hessenbergs matrices (i.e. those with all-0 entries under the first subdiagonal) are cheap to do QR decomposition using Givens rotations. For a general square matrix $A \in \mathbb{R}^{n \times n}$ can be reduced to its **Hessenberg form/matrix** with unitary similarity transformations. The Hessenberg form is almost-upper-triangular matrix with entries on the first subdiagonal: $H \equiv Q^{*}AQ=\begin{bmatrix} \times & \times & \dots & \dots & \times \\ \times & \times & \dots & \dots & \times \\ & \times & \dots & \dots & \times \\ & & \ddots & \vdots & \vdots \\ & & & \times & \times \end{bmatrix}$ *The QR factorization of tridiagonal matrices are efficient*: pre-applying $(n-1)$ rotations is enough. - Essentially, those rotations move the subdiagonal to the super diagonal and the one above it. A **Givens rotation** $J_{\theta}(i,j)$ is an orthogonal matrix filled with zeros except where for some $\theta$, $\begin{pmatrix} J_{ii} & J_{ij} \\ J_{ji} & J_{jj} \end{pmatrix} = \begin{pmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{pmatrix}$and given a vector/column $v=(\dots,\underset{i}{a},\dots,\underset{j}{b},\dots)^{T}$, $\exists\theta:v \xmapsto{J_{\theta}(i,j)}(\dots,\underset{i}{\sqrt{ a^{2}+b^{2} }},\dots,\underset{j}{0},\dots)^{T}$where the $j$th entry is "merged" into the $i$th entry. To explicity find $\theta$, note that the $j$th entry of the product is $[J_{\theta}(i,j)v]_{j}=b\cos(\theta)-a\sin(\theta)=0$, so $\theta=\arctan(b / a)$. - Note that the matrix is not the $\mathbb{R}^{2\times {2}}$ rotation-by-$\theta$ matrix, but its transpose. The QR factorization of tridiagonal forms are done by successive Givens rotations, which move the subdiagonal above the diagonal:$\begin{bmatrix} \times & \times & \times & \times \\ \times& \times & \times & \times \\ & \times & \times & \times \\ &&\times&\times \\ \end{bmatrix}\mapsto\begin{bmatrix} \times & \times & \times & \times \\ & \times & \times & \times \\ & \times & \times & \times \\ &&\times&\times \\ \end{bmatrix}\mapsto\begin{bmatrix} \times & \times & \times & \times \\ & \times & \times & \times \\ & & \times & \times \\ &&\times&\times \\ \end{bmatrix}\mapsto\cdots$ This is done via applying $J_{\theta_{k}}(k,k+1),\,\,k=1,\dots,n-1$, with $\theta_{k}$ chosen so that the reflection moves the $(k,k+1)$ entry into the $(k,k)$ entry: $\underbrace{J_{\theta_{n-1}}(n-1,n)\dots J_{\theta_{2}}(2,3)J_{\theta_{1}}(1,2)}_{Q^{T}}A=R \Longrightarrow A=QR.$ ^f58b5f This is the QR factorization of a tridiagonal matrix $A$, which is done in $O(n^{2})$ steps (each pre-applied rotation acts on two rows of $A$), much more efficient than the $O(n^{3})$ QR factorization of regular matrices. ## Properties of the QR - Furthermore, for the full-ranked $A$, *the QR factorization is also unique up to the signs of $\mathrm{diag}(R)$ and columns of $Q$*. So if we require the diagonal elements $r_{ii}\ge 0$, the QR factorization is unique. > [!proof]- > Take two full QR decompositions $A=QR=\tilde{Q}\tilde{R}$ where both $R,\tilde{R}$ have non-negative diagonals. > > Then $R \tilde{R}^{-1}=Q^{T}\tilde{Q}$, but $\mathrm{LHS}$ is upper-triangular and $\mathrm{RHS}$ is orthogonal. The only possibility is $R \tilde{R}^{-1}=\mathrm{diag}(\pm 1,\dots,\pm 1)$. Since we further require $R,\tilde{R}$ to have non-negative diagonals, $R \tilde{R}^{-1}=I$, and $R=\tilde{R},Q=\tilde{Q}$. Householder QR is [[Numerical Stability#Stability|backward stable]], because it is based on orthogonal transformations. In particular, > [!theorem|*] Stability of Householder QR > Let $\hat{v}_{1},\dots$ be the computed Householder vectors, and $\hat{H}_{1},\dots$ be the *exactly orthogonal maps* they define -- not any computed version $fl(I-vv^{T})$, since they are never explicitly formed. > > Then with $\hat{Q}:= \hat{H}_{1}^{T}\dots \hat{H}_{n}^{T}$ (again, not the computed version) and $\hat{R}:= fl(\hat{H}_{n}\dots \hat{H}_{1}A)$, $\begin{align*} \| \hat{Q}\hat{R}-A \| &= O(\epsilon\cdot \| A \|) ,\\ \| \hat{Q}^{T}\hat{Q}-I \| &= O(\epsilon). \end{align*}$ > > However, this does not imply that $\hat{Q}-Q,\hat{R}-R$ are small, where $Q,R$ are actual QR factors of $A$. > > > > [!proof]- > > Because $\hat{R}=fl(\hat{H}_{n}\dots \hat{H}_{1}A)$ is the computed version of applying the exactly orthogonal maps $\hat{H}_{1},\dots,\hat{H}_{n}$, it is forward stable, with $\begin{align*} > \hat{R}&= \hat{H}_{n}\dots \hat{H}_{1}A+O(\epsilon\cdot \| A \| ). > \end{align*}$ > > > > Therefore, computing $\hat{Q}\hat{R}=\hat{H}_{1}^{T}\dots \hat{H}_{n}^{T}\hat{R}$, another orthogonal transformation, has forward error $\hat{Q}\hat{R} = \hat{H}_{1}^{T}\dots \hat{H}_{n}^{T}\hat{H}_{n}\dots \hat{H}_{1}A +O(\epsilon \cdot \| A \| )=A+O(\epsilon\cdot \| A \| ).$ > > However, since we cannot guarantee that the computation of the reflectors $\hat{v}_{1},\dots$ are forward stable, we cannot say that $\hat{Q},\hat{R}$ defined by them above are close to $Q,R$ defined by the actual reflectors $v_{1},\dots$. > ## Applications of the QR ### Solving Linear Systems It's easy to solve the linear system $Ry=z$, if $R$ is upper triangular, as **back substitution** (i.e. solving from the bottom up) is $O(n^{2})$. It's also easy to solve $Qy=z$, if $Q$ is orthogonal, as its inverse is just its transpose; multiplying them together is also $O(n^{2})$. Hence the QR decomposition provides an algorithm to solve the system $Ax=b$ in $O(n^{2})$ time: > [!algorithm] Solution to linear systems > Compute the decomposition $R,Q:A=QR$. > Solve $y:Ry=b$ where $y=Qx$ with back substitution. > Solve $x:Qx=y$ by applying the invere $Q^{T}$. ### Least Squares Problems In the case $A \in \mathbb{R}^{m\times n}$ with $b \not \in \mathrm{col}(A)$, the system $Ax=b$ is **overdetermined** and does not have a solution. Instead, we look for an $x$ that gives the best approximation to $b$: $\min_{x}\| Ax-b \|_{2}. $This is obviously closely related to [[Linear Regression Methods#Least Squares Linear Regression|OLS]], where we approximate the response $b=y$ with a linear combination of the design matrix $\mathbf{X}=As matrices. If we have the full $A=QR$ and thin decompositions $A=\hat{Q}\hat{R}$, the objective can be written as $\| Ax-b \|=\| QRx-b \|=\| Rx-Q^{T}b \| $since the orthogonal map $Q$ preserves the $l_{2}$ norm. In block form, the objective becomes $\left\| \begin{bmatrix} \hat{R} \\ 0 \end{bmatrix}x-\begin{bmatrix} \hat{Q}^{T}b \\ Q_{\perp}^{T}b \end{bmatrix} \right \|=\| \hat{R}x -\hat{Q}^{T}b\| +\| Q^{T}_{\perp}b \| ,$where the second term is a lost cause, and we can eliminate the first term with $x=\hat{R}^{-1}\hat{Q}^{T}b$. In practice, we find the solution by solving the triangular system $\hat{R}x=\hat{Q}^{T}b$.