> [!tldr] QR Algorithm
> The QR algorithm is an iterative algorithm for the [[Eigenvalue Problems|eigenvalue problem]] of a matrix $A$ that uses the [[QR Factorization]].
We assume that $A$ has a basis of eigenvectors $v_{i}$ (all unit-length), corresponding to eigenvalues $\lambda_{i}$, where $\lambda_{i}$ are ordered by magnitude.
The vanilla QR algorithm is straightforward:
> [!algorithm] QR algorithm
> $[0]$ Initialize with $A_{0}:=A$.
> For $[k=0,\dots]$ until convergence:
> - Compute the decomposition $A_{k}=Q_{k}R_{k}$,
> - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}$.
>
> Return the eigenvalue estimates $\mathrm{diag}(A_{k})$ and eigenvector estimates $Q^{(k)}:= Q_{1}\dots Q_{k}$.
Schematically, the algorithm is
$\begin{align*}
& A_{0}=Q_{0}R_{0}\xrightarrow{\text{swap}} A_{1}:= R_{0}Q_{0}\xrightarrow{\text{factorize}}A_{1}=Q_{1}R_{1}\to \cdots \xrightarrow{\text{converges}}A_{k}\approx(\urcorner).
\end{align*}$
The diagonal of $A_{k}$ gives its eigenvalues $\lambda_{1},\dots,\lambda_{n}$, which are also the eigenvalues of $A$ because they are similar:
$\begin{align*}
A_{k+1}&= R_{k}Q_{k}\\
&= Q_{k}^{-1}A_{k}Q_{k}\\
&= \dots\\
&= (Q_{1}\dots Q_{k})^{-1}A(Q_{1}\dots Q_{k})\\
&= (Q^{(k)})^{-1}AQ^{(k)}.
\end{align*}$
## Convergence of the QR Algorithm
Why does the algorithm converge? We shall show that $A^{k}=Q^{(k)}R^{(k)}$ (similarly we define $R^{(k)}:=R_{k}\dots R_{1}$) to be the QR decomposition of $A^{k}$, and we will see that $Q^{(k)}$ converges to something.
> [!proof] Proof that $A^{k}=Q^{(k)}R^{(k)}$
> Induction; the base case is trivial.
>
> Assuming $A^{k}=Q^{(k)}R^{(k)}$, then noting that $Q^{(k)}A_{k+1}=AQ^{(k)}$ by the previous result, $\begin{align*}
A^{k+1}&= AQ^{(k)}R^{(k)}\\
&= Q^{(k)}A_{k+1}R^{(k)}\\
&= Q^{(k)}Q_{k+1}R_{k+1}R^{(k)}\\
&= Q^{(k+1)}R^{(k+1)}.
\end{align*}$
### QR as Simultaneous Iterations
Consider the factorization $A^{k}=Q^{(k)}R^{(k)}$ as a [[QR Factorization#QR with Gram-Schmidt|Gram-Schmidt process]]: denoting columns of $A^{k}$ as $\{ A^{k}e_{i} \}$ ($e_{i}$ are the columns of $I_{n}$),
$Q^{(k)}R^{(k)}=(A^{k}e_{1},..,A^{k}e_{n})$
turns the columns of $A^{k}I$ into $Q^{(k)}$, an orthogonal basis.
Therefore, we have the conclusion that
> [!connection] The QR algorithm is the [[Power Iterations#Simultaneous Iterations|(simultaneous) power iteration]] under the hood.
We know that this simultaneous iteration $A^{k}I \mapsto Q^{(k)}$ gives vectors converging to the eigenvectors of $A$: $Q^{(k)}=(q^{(k)}_{1},\dots,q^{(k)}_{n}) \to (v_{1},\dots,v_{n}),$hence $A_{k+1}$ in $Q^{(k)}A_{k+1}Q^{(k)T}=A$ converges to the diagonalisation of $A$, where the diagonal term reveals the eigenvalues.
In particular, the first column is $q_{1}^{(k)}:=Q^{(k)}e_{1}$, and it has convergence $ q_{1}^{(k)}\propto Q^{(k)}R^{(k)}e_{1}=A^{k}e_{1}\to v_{1},$so *the first column of $A_{k}=(Q^{(k)})^{T}AQ^{(k)}$ has convergence $A_{k}e_{1}\to \lambda_{1}e_{1}$ with rate $| \lambda_{1} / \lambda_{2} |$.*
### QR as Inverse Power Iterations
If we take the transpose of $A^{-k}$, we have its QR decomposition $(A^{-k})^{T}=Q^{(k)}(R^{(k)})^{-T},$
so implicitly we are also applying an [[Power Iterations#Shifted Inverse Power Iteration|inverse power iteration]] with no shifts (i.e. applying $A^{-T}$ to the vectors $e_{1},\dots$). Since $(R^{(k)})^{-T} \in (\llcorner)$, we have $q^{(k)}_{n}\propto Q^{(k)}(R^{(k)})^{-T}e_{n}=(A^{T})^{-k}e_{n}\to u_{n},$where $u_{n}$ is the least dominant right eigenvector of $A^{T}$, i.e. the left eigenvector of $A$ corresponding to the eigenvalue $\lambda_{n}$.
In other words, $(q^{(k)}_{n})^{T}Aq^{(k)}_{n}\to u_{1}^{T}Au_{1}=\lambda_{n},$so *$A_{k}=Q^{(k)T}AQ^{(k)}$ has its last row $e_{n}A_{k}=(q_{n}^{(k)})^{T}AQ^{(k)}\to \lambda_{n}e_{n}$ with linear rate $| \lambda_{n} / \lambda_{n-1}|$*.
> [!connection] The QR algorithm is also an inverse power iteration under the hood.
## Accelerating the QR
### Shifts in QR Algorithm
> [!bigidea]
> Shifts can increase the ratios between eigenvalues, hence giving faster convergence and fewer iterations.
Given this connection, we can apply shifts like in inverse iterations to speed up the convergence. *If we choose a good shift on the $k$th iteration, we can speed up the convergence on the last row of $A_{k+1}$,* getting
$A_{k+1}\approx\begin{bmatrix}
{\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {}&{}&{}&{}&{\lambda_{n}}\end{bmatrix}.$
Remember that this convergence is a result of the (unshifted) inverse iteration $\underbrace{A^{-T}\dots A^{-T}}_{k \text{ times}}=Q_{k}\dots Q_{1}R_{1}^{-1}\dots R_{k}^{-1}.$
To emulate the shifted inverse iteration $\prod_{k}(A^{T}-s_{k}I)^{-1}=Q_{k}\dots Q_{1}R^{-1}_{1}\dots R^{-1}_{k},$ we need some other definition of $Q_{k},R_{k}$:
> [!algorithm] Shifted QR algorithm
> $[0]$ Initialize with $A_{0}:=A$.
> For $[k=0,\dots]$:
> - Choose the shift $s_{k}$ (e.g. the Rayleigh quotient).
> - Compute the decomposition $A_{k}-s_{k}I=Q_{k}R_{k}$,
> - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}+s_{k}I$.
We can verify the analogous result that $(A-s_{k+1}I)Q^{(k)}=Q^{(k)T}(A_{k+1}-s_{k+1}I)$, so $\prod_{k}(A-s_{k}I)^{-1}=Q^{(k)}(R^{(k)})^{-1},$and the algorithm indeed carries out the shifted inverse iteration.
With the shifted algorithm, $A_{k+1}$ and $A_{k}$ are still similar and have the same eigenvalues: since $Q_{k}=(A_{k}-s_{k}I)R_{k}^{-1}$,
$A_{k+1}=R_{k}Q_{k}+s_{k}I=R_{k}(A_{k}-s_{k}I)R_{k}^{-1}+s_{k}I=R_{k}A_{k}R_{k}^{-1}.$
In fact, the iterant $\prod_{k}(A-s_{k}I)^{-1}$ has eigenvalues $\left\{ \prod_{k}(\lambda_{i}-s_{k})^{-1} ~|~i=1,\dots,n\right\}$, so split over the $k$ iterations, the shifted QR implicitly runs power iteration on a matrix with eigenvalues $\mu_{i}:= \left( \prod_{k} \frac{1}{\lambda_{i}-s_{k}}\right)^{1 / k},$which can be huge with good choices of $s_{k}$.
> [!idea] QR algorithm implicitly runs power iterations of a matrix, and good shifts make its eigenvalues huge, speeding up convergence.
In practice, this technique allows the algorithm to terminate in $O(n)$ iterations.
### Preprocessing into Hessenberg Form
> [!bigidea]
> Preprocessing $A$ into its Hessenberg form makes each iteration fast.
Since [[QR Factorization#QR on Hessenberg Matrices|QR factorization of Hessenberg matrices]] is cheap, we can try to save time by pre-processing the matrix $A$ into its Hessenberg form $H_{A}$:
$\underset{A}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto \underset{H_{1}AH_{1}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto\dots \mapsto \underset{H_{n}\dots H_{1}AH_{1}^{T}\dots H_{n}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &&\ast&\ast&\ast\\ &&&\ast&\ast\end{bmatrix}}=:H_{A}$
where each layer of $H_{i}(\dots)H_{i}^{T}$ removes all column $i