QR factorization decomposes a matrix $A \in \mathbb{R}^{m \times n},m \ge n$ into the product of an orthogonal/orthonormal matrix $Q$ and an upper-triangular $R$.
The **thin/reduced QR factorization** produces a "thin" orthonormal ${Q}\in \mathbb{R}^{m\times n}$ (so ${Q}^{T}{Q}=I_{n}$ but $QQ^{T}\ne I_{m}$), and a "small" $R\in \mathbb{R}^{n\times n}$: $A_{m \times n}=\begin{bmatrix}
\Bigg\uparrow & &\Bigg\uparrow \\
q_{1} & \dots & q_{n} \\
\Bigg\downarrow & & \Bigg\downarrow
\end{bmatrix}_{m \times n}
\begin{bmatrix}
r_{11} & \dots & r_{1n}\\
& \ddots & \vdots\\
& & r_{nn}
\end{bmatrix}_{n\times n}.$
The **full QR factorization** produces a "full" orthogonal $Q_{F}\in\mathbb{R}^{m\times m}$ and a "large" $R_{F}=\in \mathbb{R}^{m\times n}$: $A_{m \times n}=\left[\begin{array}{c|c} \begin{array}{c c}\Bigg\uparrow & &\Bigg\uparrow \\q_{1} & \dots & q_{n} \\\Bigg\downarrow & & \Bigg\downarrow\end{array} & \begin{array}{c c}\Bigg\uparrow & &\Bigg\uparrow \\ q_{n+1} & \dots & q_{m} \\\Bigg\downarrow & & \Bigg\downarrow\end{array} \end{array}\right]
_{m \times m}
\begin{bmatrix}
r_{11} & \dots & r_{1n}\\
& \ddots & \vdots\\
& & r_{nn} \\
\\ \\
\end{bmatrix}_{m\times n}$where $Q_{F}$ is orthogonal: $Q^{T}_{F}Q_{F}=Q_{F}Q_{F}^{T}=I_{m}$.
- We usually denote the full $Q_{F}$ $[Q ~|~ Q_{\perp}]$, since the additional columns $Q_{\perp}$ are orthogonal to the thin ${Q}$; they can be computed as any orthogonal extension to $Q$. We can also write $R_{F}=\begin{bmatrix}R \\ 0\end{bmatrix}$.
- Although $Q_{F}$ has additional columns, they only multiply with the additional zero rows in $R_{F}$, so they have no effect on the product.
## Derivation and Existence
### QR with Gram-Schmidt
> [!bigidea]
> Gram-Schmidt on $A$ makes it orthogonal via multiplication with an upper-triangular matrix.
Recall that the Gram-Schmidt orthogonalization process generates an orthonormal set, given a set of independent vectors:
$\{ u_{1},\dots,u_{n} \} \text{ independent} \xmapsto{\text{G-S}}\{ v_{1},\dots,v_{n} \}\text{ orthonormal}$
where the two set have the same span.
If the $n$ columns of $A$ are linearly independent, then the Gram-Schmidt produces a thin QR factorization.
> [!proof]
> Apply G-S to $\{ a_{1},\dots,a_{n} \}$, the $n$ columns of $A$, and if they are all independent (when $\mathrm{rank}(A)=n$), we get orthogonal vectors
>
> $\begin{align*}
q_{1}&= {\frac{a_{1}}{r_{11}}},\\ q_{2}&= {\frac{a_{2}-r_{12}q_{1}}{r_{22}}},\\ q_{j}&= {\frac{a_{j}-\sum_{i=1}^{j-1}r_{i j}q_{i}}{r_{j j}}}.
\end{align*}$
>
> We can solve for the columns of $a$ to get $\begin{array}{l}{{a_{1}=r_{11}q_{1}}}\\ {{a_{2}=r_{12}q_{1}+r_{22}q_{2}}}\\ {{a_{j}=r_{1j}q_{1}+r_{2j}q_{2}+\cdot\cdot\cdot+r_{j j}q_{j}.}}\end{array}$
> In matrix form, this is just
> $A=QR,$
> with ${Q}=[q_{1},\dots,q_{n}]$ and ${R}=(r_{ij})$.
We assumed non-singularity to derive the thin QR but it's easy to generalize this to singular matrices and full QR decompositions by expanding the basis $\{ q_{1},\dots,q_{\mathrm{rank}(A)} \}$ to $\{ q_{1},\dots,q_{n} \}$:
> [!proof]
> Let $\mathrm{rank}(A)=k$, then WLOG let $\{ a_{1},\dots,a_{k} \}$ be $k$ linearly independent columns, and QR decompose it to get $\begin{bmatrix}
\Bigg\uparrow & &\Bigg\uparrow \\
a_{1} & \dots & a_{k} \\
\Bigg\downarrow & & \Bigg\downarrow
\end{bmatrix}_{m \times k}=\begin{bmatrix}
\Bigg\uparrow & &\Bigg\uparrow \\
q_{1} & \dots & q_{k} \\
\Bigg\downarrow & & \Bigg\downarrow
\end{bmatrix}_{m \times k}
\begin{bmatrix}
r_{11} & \dots & r_{1k}\\
& \ddots & \vdots\\
& & r_{kk}
\end{bmatrix}_{k\times k}$Then the remaining columns $\{ a_{k+1},\dots,a_{n} \} \subset \langle a_{1},\dots, a_{k} \rangle=\langle q_{1},\dots, q_{k} \rangle$ are expressible in terms of $q_{1},\dots,q_{k}$, so arbitrarily extend them to an orthogonal set $q_{1},\dots,q_{n}$: $
\begin{bmatrix}
\Bigg\uparrow & &\Bigg\uparrow
\\ a_{1} & \dots & a_{n}
\\ \Bigg\downarrow & & \Bigg\downarrow
\end{bmatrix}
=\begin{bmatrix}
\Bigg\uparrow & &\Bigg\uparrow
\\ q_{1} & \dots & q_{n}
\\ \Bigg\downarrow & & \Bigg\downarrow
\end{bmatrix}_{m \times n}
\begin{bmatrix}
r_{11} & \dots & r_{1k} & \dots & \dots & r_{1n}
\\ &\ddots & \vdots & \ddots & \ddots & \vdots
\\ & & r_{kk} & \dots & \dots & r_{kn}
\\ & & & 0 & \dots & 0
\\ & & & & \ddots & \vdots
\\ & & & & & 0
\end{bmatrix}_{n\times n}$where lower-right corner of $R$ is empty since $a_{k+1},\dots,a_{n}$ can be expressed without $q_{k+1},\dots,q_{n}$. This the thin QR factorization.
Now extending the basis again gives us the full QR.
### QR with Householder Reflections
> [!bigidea]
> Householder reflections are orthogonal maps that can iteratively make $A$ upper-triangular.
We wish to find orthogonal $Q_{1,2,\dots}$ that eliminates elements $A$ under the diagonals, column by column:
$A \xrightarrow{Q_{1}}\left[\begin{array}{c|c}
\times & \times \dots \times \\
& \times \dots \times \\
& \times \dots \times \\
& \times \dots \times
\end{array}\right]
\xrightarrow{Q_{2}}
\left[\begin{array}{c|c}
\times \,\, \times & \times \dots \times \\
\,\,\,\,\,\,\,\times & \times \dots \times \\
& \times \dots \times \\
& \times \dots \times
\end{array}\right]\xrightarrow{Q_{3}}\cdots
$
Note that since $Q_{i}$ are orthogonal, each column must retain its norm, and in particular the first column undergoes $a_{1} \mapsto (\| a_{1} \|,0,\dots,0)^{T}$.
The matrix that reflects vectors across a plane orthogonal to $v$ is the **Householder matrix/reflector**, given by $H_{v}=I-2vv^{T},$where $v$ is a unit-length vector controlling the direction being flipped. It maps $u \mapsto (u-2 \times \text{its projection in direction of }v)$.
![[Householder.png#invert|w60|center]]
- Note that for a vector $u \in \mathbb{R}^{m}$, $H_{v}u=u-2v(v^{T}u)$ can be computed in $O(m)$ time rather than $O(m^{2})$.
Alternatively, given a vector $u$ and the destination $w$ we want to map it to (where $\| u \|=\| w \|$), we can solve for the $v$ such that $H_{v}u=w$: $v=\frac{u-w}{\| u-w \| }.$Therefore, for any $u \in \mathbb{R}^{n}$, we can find a Householder matrix $H_{v}$ that maps $u$ the destination $w:= (\| u \|,0,\dots,0)^{T}$, exactly what we need (at least for the first column $a_{1}$).
Recall that $Q_{1}$ needs to map $a_{1}$ to $(\| a_{1} \|,0,\dots,0)^{T}$. Choosing $v_{1} \propto a_{1}-(\|a_{1}\|,0,\dots,0)^{T}$ gives the Householder matrix $Q_{1}:=H_{v_{1}}$ such that$H_{v_{1}}a_{1}=\|a_{1}\|e_{1}=(\|a_{1}\|,0,\dots,0)^{T}$
We need more Householder reflectors $Q_{j}:= H_{v_{j}}$ to map $A$ like $A \overset{Q_{1}}{\mapsto}\begin{bmatrix}
\times & \times & \times & \times \\
& \times & \times & \times \\
& \times & \times & \times \\
&\times&\times & \times \\
\end{bmatrix}\overset{Q_{2}}{\mapsto}\begin{bmatrix}
\times & \times & \times & \times \\
& \times & \times & \times \\
& & \times & \times \\
&&\times&\times \\
\end{bmatrix}\overset{Q_{3}}{\mapsto}\begin{bmatrix}
\times & \times & \times & \times \\
& \times & \times & \times \\
& & \times & \times \\
&&&\times \\
\end{bmatrix}\mapsto\cdots$
We have cleared the first column with $Q_{1}$, and we don't want $Q_{2},\dots$ to destroy the progress -- therefore, $v_{2},\dots$ must have $0$ in the first entry (i.e. their reflectors leave the first coordinate alone). Induction shows that $v_{j}$ has the form $v_{j}= ( \underbrace{0,\dots 0
}_{j-1 \text{ zeros}},\underbrace{ \times, \dots, \times}_{=: \tilde{v}_{j}}).$
In particular, we choose $\tilde{v}_{j}$ to clear the subdiagonal part of the $j$th column of $Q_{j-1}\dots Q_{1}A$.
Induct on the size of $A \in \mathbb{R}^{m \times n}$, assuming that any matrix $B \in \mathbb{R}^{(m-1)\times(n-1)}$ has a QR factorization with Householder reflectors $\tilde{Q}_{2},\dots,\tilde{Q}_{n}$ such that $\tilde{Q}_{n}\cdots \tilde{Q}_{2}B=\tilde{R} \in (\urcorner),$and we have $\tilde{Q}:= \tilde{Q}_{n}^{T}\cdots \tilde{Q}_{2}^{T}$ such that $B=\tilde{Q}\tilde{R}$.
- As the base case, we have $A \in \mathbb{R}^{(m-n+1)\times 1}$, where one Householder reflection does the job.
Then reduce the first column $a_{1}$ with $Q_{1}:=H_{v_{1}}:a_{1}\mapsto(\|a_{1}\|,0,\dots,0)^{T}$, giving $Q_{1}A=\left[ \begin{array}{c|c} \|a_{1}\| & \times\dots \times \\ \hline \begin{array}{}0\\ \vdots\\ 0\end{array} & B \end{array} \right]$
Then to use $B=\tilde{Q}\tilde{R}$, we pad $\tilde{Q}$ with a row and column to get $\begin{align*}
\underbrace{\begin{bmatrix}1 & \\& \tilde{Q}^{T}\end{bmatrix}Q_{1}}_{=:Q^{T}}&A= \begin{bmatrix}
1 & \\& \tilde{Q}^{T}\end{bmatrix}\begin{bmatrix}
\|a_{1}\| & \times \\& B\end{bmatrix}= \underbrace{\begin{bmatrix}
\|a_{1}\| & \times \\ & \tilde{R} \end{bmatrix}}_{=:R}\\
&\Rightarrow A=QR
\end{align*}$which gives the QR factorization of $A$ and completes the induction.
To obtain the successive Householder reflections, we can pad $\tilde{Q}_{i}$ like what we did to $\tilde{Q}$ to get $Q_{i}:= \begin{bmatrix}1 & \\ & \tilde{Q}_{i}\end{bmatrix}$, so that $Q_{n}\cdots Q_{1}A=R.$
Note that this produces the full QR directly, as each $Q_{i} \in \mathbb{R}^{m\times m}$ produces a $Q:Q_{1}^{T}\dots Q_{n}^{T} \in \mathbb{R}^{m\times m}$ too, and $R$ contains a number of zero rows (corresponding to the zeros produced in the base case $A \in \mathbb{R}^{(m-n+1)\times1}$).
### QR on Hessenberg Matrices
> [!bigidea]
> Hessenbergs matrices (i.e. those with all-0 entries under the first subdiagonal) are cheap to do QR decomposition using Givens rotations.
For a general square matrix $A \in \mathbb{R}^{n \times n}$ can be reduced to its **Hessenberg form/matrix** with unitary similarity transformations. The Hessenberg form is almost-upper-triangular matrix with entries on the first subdiagonal: $H \equiv Q^{*}AQ=\begin{bmatrix}
\times & \times & \dots & \dots & \times \\
\times & \times & \dots & \dots & \times \\
& \times & \dots & \dots & \times \\
& & \ddots & \vdots & \vdots \\
& & & \times & \times
\end{bmatrix}$
*The QR factorization of tridiagonal matrices are efficient*: pre-applying $(n-1)$ rotations is enough.
- Essentially, those rotations move the subdiagonal to the super diagonal and the one above it.
A **Givens rotation** $J_{\theta}(i,j)$ is an orthogonal matrix filled with zeros except where for some $\theta$, $\begin{pmatrix}
J_{ii} & J_{ij} \\
J_{ji} & J_{jj}
\end{pmatrix} = \begin{pmatrix}
\cos(\theta) & \sin(\theta) \\
-\sin(\theta) & \cos(\theta)
\end{pmatrix}$and given a vector/column $v=(\dots,\underset{i}{a},\dots,\underset{j}{b},\dots)^{T}$, $\exists\theta:v \xmapsto{J_{\theta}(i,j)}(\dots,\underset{i}{\sqrt{ a^{2}+b^{2} }},\dots,\underset{j}{0},\dots)^{T}$where the $j$th entry is "merged" into the $i$th entry. To explicity find $\theta$, note that the $j$th entry of the product is $[J_{\theta}(i,j)v]_{j}=b\cos(\theta)-a\sin(\theta)=0$, so $\theta=\arctan(b / a)$.
- Note that the matrix is not the $\mathbb{R}^{2\times {2}}$ rotation-by-$\theta$ matrix, but its transpose.
The QR factorization of tridiagonal forms are done by successive Givens rotations, which move the subdiagonal above the diagonal:$\begin{bmatrix}
\times & \times & \times & \times \\
\times& \times & \times & \times \\
& \times & \times & \times \\
&&\times&\times \\
\end{bmatrix}\mapsto\begin{bmatrix}
\times & \times & \times & \times \\
& \times & \times & \times \\
& \times & \times & \times \\
&&\times&\times \\
\end{bmatrix}\mapsto\begin{bmatrix}
\times & \times & \times & \times \\
& \times & \times & \times \\
& & \times & \times \\
&&\times&\times \\
\end{bmatrix}\mapsto\cdots$
This is done via applying $J_{\theta_{k}}(k,k+1),\,\,k=1,\dots,n-1$, with $\theta_{k}$ chosen so that the reflection moves the $(k,k+1)$ entry into the $(k,k)$ entry: $\underbrace{J_{\theta_{n-1}}(n-1,n)\dots J_{\theta_{2}}(2,3)J_{\theta_{1}}(1,2)}_{Q^{T}}A=R \Longrightarrow A=QR.$
^f58b5f
This is the QR factorization of a tridiagonal matrix $A$, which is done in $O(n^{2})$ steps (each pre-applied rotation acts on two rows of $A$), much more efficient than the $O(n^{3})$ QR factorization of regular matrices.
## Properties of the QR
- Furthermore, for the full-ranked $A$, *the QR factorization is also unique up to the signs of $\mathrm{diag}(R)$ and columns of $Q$*. So if we require the diagonal elements $r_{ii}\ge 0$, the QR factorization is unique.
> [!proof]-
> Take two full QR decompositions $A=QR=\tilde{Q}\tilde{R}$ where both $R,\tilde{R}$ have non-negative diagonals.
>
> Then $R \tilde{R}^{-1}=Q^{T}\tilde{Q}$, but $\mathrm{LHS}$ is upper-triangular and $\mathrm{RHS}$ is orthogonal. The only possibility is $R \tilde{R}^{-1}=\mathrm{diag}(\pm 1,\dots,\pm 1)$. Since we further require $R,\tilde{R}$ to have non-negative diagonals, $R \tilde{R}^{-1}=I$, and $R=\tilde{R},Q=\tilde{Q}$.
Householder QR is [[Numerical Stability#Stability|backward stable]], because it is based on orthogonal transformations. In particular,
> [!theorem|*] Stability of Householder QR
> Let $\hat{v}_{1},\dots$ be the computed Householder vectors, and $\hat{H}_{1},\dots$ be the *exactly orthogonal maps* they define -- not any computed version $fl(I-vv^{T})$, since they are never explicitly formed.
>
> Then with $\hat{Q}:= \hat{H}_{1}^{T}\dots \hat{H}_{n}^{T}$ (again, not the computed version) and $\hat{R}:= fl(\hat{H}_{n}\dots \hat{H}_{1}A)$, $\begin{align*}
\| \hat{Q}\hat{R}-A \| &= O(\epsilon\cdot \| A \|) ,\\
\| \hat{Q}^{T}\hat{Q}-I \| &= O(\epsilon).
\end{align*}$
>
> However, this does not imply that $\hat{Q}-Q,\hat{R}-R$ are small, where $Q,R$ are actual QR factors of $A$.
>
>
> > [!proof]-
> > Because $\hat{R}=fl(\hat{H}_{n}\dots \hat{H}_{1}A)$ is the computed version of applying the exactly orthogonal maps $\hat{H}_{1},\dots,\hat{H}_{n}$, it is forward stable, with $\begin{align*}
> \hat{R}&= \hat{H}_{n}\dots \hat{H}_{1}A+O(\epsilon\cdot \| A \| ).
> \end{align*}$
> >
> > Therefore, computing $\hat{Q}\hat{R}=\hat{H}_{1}^{T}\dots \hat{H}_{n}^{T}\hat{R}$, another orthogonal transformation, has forward error $\hat{Q}\hat{R} = \hat{H}_{1}^{T}\dots \hat{H}_{n}^{T}\hat{H}_{n}\dots \hat{H}_{1}A +O(\epsilon \cdot \| A \| )=A+O(\epsilon\cdot \| A \| ).$
> > However, since we cannot guarantee that the computation of the reflectors $\hat{v}_{1},\dots$ are forward stable, we cannot say that $\hat{Q},\hat{R}$ defined by them above are close to $Q,R$ defined by the actual reflectors $v_{1},\dots$.
>
## Applications of the QR
### Solving Linear Systems
It's easy to solve the linear system $Ry=z$, if $R$ is upper triangular, as **back substitution** (i.e. solving from the bottom up) is $O(n^{2})$.
It's also easy to solve $Qy=z$, if $Q$ is orthogonal, as its inverse is just its transpose; multiplying them together is also $O(n^{2})$.
Hence the QR decomposition provides an algorithm to solve the system $Ax=b$ in $O(n^{2})$ time:
> [!algorithm] Solution to linear systems
> Compute the decomposition $R,Q:A=QR$.
> Solve $y:Ry=b$ where $y=Qx$ with back substitution.
> Solve $x:Qx=y$ by applying the invere $Q^{T}$.
### Least Squares Problems
In the case $A \in \mathbb{R}^{m\times n}$ with $b \not \in \mathrm{col}(A)$, the system $Ax=b$ is **overdetermined** and does not have a solution. Instead, we look for an $x$ that gives the best approximation to $b$: $\min_{x}\| Ax-b \|_{2}. $This is obviously closely related to [[Linear Regression Methods#Least Squares Linear Regression|OLS]], where we approximate the response $b=y$ with a linear combination of the design matrix $\mathbf{X}=A