> [!tldr] QR Algorithm
> The QR algorithm is an iterative algorithm for the [[Eigenvalue Problems|eigenvalue problem]] of a matrix $A$ that uses the [[QR Factorisation]].
We assume that $A$ has a basis of eigenvectors $v_{i}$ (all unit-length), corresponding to eigenvalues $\lambda_{i}$, where $\lambda_{i}$ are ordered by magnitude.
The vanilla QR algorithm is straightforward:
> [!algorithm] QR algorithm
> $[0]$ Initialize with $A_{0}:=A$.
> For $[k=0,\dots]$ until convergence:
> - Compute the decomposition $A_{k}=Q_{k}R_{k}$,
> - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}$.
>
> Return the eigenvalue estimates $\mathrm{diag}(A_{k})$ and eigenvector estimates $Q^{(k)}:= Q_{1}\dots Q_{k}$.
Schematically, the algorithm is
$\begin{align*}
& A_{0}=Q_{0}R_{0}\xrightarrow{\text{swap}} A_{1}:= R_{0}Q_{0}\xrightarrow{\text{factorize}}A_{1}=Q_{1}R_{1}\to \cdots \xrightarrow{\text{converges}}A_{k}\approx(\urcorner).
\end{align*}$
The diagonal of $A_{k}$ gives its eigenvalues $\lambda_{1},\dots,\lambda_{n}$, which are also the eigenvalues of $A$ because they are similar:
$\begin{align*}
A_{k+1}&= R_{k}Q_{k}\\
&= Q_{k}^{-1}A_{k}Q_{k}\\
&= \dots\\
&= (Q_{1}\dots Q_{k})^{-1}A(Q_{1}\dots Q_{k})\\
&= (Q^{(k)})^{-1}AQ^{(k)}.
\end{align*}$
## Convergence of the QR Algorithm
Why does the algorithm converge? We shall show that $A^{k}=Q^{(k)}R^{(k)}$ (similarly we define $R^{(k)}:=R_{k}\dots R_{1}$) to be the QR decomposition of $A^{k}$, and we will see that $Q^{(k)}$ converges to something.
> [!proof] Proof that $A^{k}=Q^{(k)}R^{(k)}$
> Induction; the base case is trivial.
>
> Assuming $A^{k}=Q^{(k)}R^{(k)}$, then noting that $Q^{(k)}A_{k+1}=AQ^{(k)}$ by the previous result, $\begin{align*}
A^{k+1}&= AQ^{(k)}R^{(k)}\\
&= Q^{(k)}A_{k+1}R^{(k)}\\
&= Q^{(k)}Q_{k+1}R_{k+1}R^{(k)}\\
&= Q^{(k+1)}R^{(k+1)}.
\end{align*}$
### QR as Simultaneous Iterations
Consider the factorization $A^{k}=Q^{(k)}R^{(k)}$ as a [[QR Factorisation#QR with Gram-Schmidt|Gram-Schmidt process]]: denoting columns of $A^{k}$ as $\{ A^{k}e_{i} \}$ ($e_{i}$ are the columns of $I_{n}$),
$Q^{(k)}R^{(k)}=(A^{k}e_{1},..,A^{k}e_{n})$
turns the columns of $A^{k}I$ into $Q^{(k)}$, an orthogonal basis.
Therefore,
> [!connection] The QR algorithm is the [[Power Iterations#Simultaneous Iterations|(simultaneous) power iteration]] under the hood.
We know that this simultaneous iteration $A^{k}I \mapsto Q^{(k)}$ gives vectors converging to the eigenvectors of $A$: $Q^{(k)}=(q^{(k)}_{1},\dots,q^{(k)}_{n}) \to (v_{1},\dots,v_{n}),$hence $A_{k+1}$ in $Q^{(k)}A_{k+1}Q^{(k)T}=A$ converges to the diagonalisation of $A$, where the diagonal term reveals the eigenvalues.
In particular, the first column is $q_{1}^{(k)}:=Q^{(k)}e_{1}$, and it has convergence $ q_{1}^{(k)}\propto Q^{(k)}R^{(k)}e_{1}=A^{k}e_{1}\to v_{1},$so *the first column of $A_{k}=(Q^{(k)})^{T}AQ^{(k)}$ has convergence $A_{k}e_{1}\to \lambda_{1}e_{1}$ with rate $| \lambda_{1} / \lambda_{2} |$.*
### QR as Inverse Power Iterations
If we take the transpose of $A^{-k}$, we have $(A^{-k})^{T}=Q^{(k)}(R^{(k)})^{-T},$
so implicitly we are also applying an [[Power Iterations#Shifted Inverse Power Iteration|inverse power iteration]] with no shifts (i.e. applying $A^{-T}$ to the vectors $e_{1},\dots$). Since $(R^{(k)})^{-T} \in (\llcorner)$, this is not a QR decomposition, but we still have $q^{(k)}_{n}\propto Q^{(k)}(R^{(k)})^{-T}e_{n}=(A^{T})^{-k}e_{n}\to u_{n},$where $u_{n}$ is the dominant right eigenvector of $A^{-T}$, i.e. the left eigenvector of $A$ corresponding to the eigenvalue $\lambda_{n}$.
In other words,
$(q^{(k)}_{n})^{T}Aq^{(k)}_{n}\to u_{1}^{T}Au_{1}=\lambda_{n},$so *$A_{k}=Q^{(k)T}AQ^{(k)}$ has its last row $e_{n}A_{k}=(q_{n}^{(k)})^{T}AQ^{(k)}\to \lambda_{n}e_{n}$ with linear rate $| \lambda_{n} / \lambda_{n-1}|$*.
Therefore,
> [!idea] The QR algorithm is also an inverse power iteration under the hood.
## Accelerating the QR
### Shifts in QR Algorithm
> [!idea] Shifts can increase the ratios between eigenvalues, hence giving faster convergence and using fewer iterations.
Given this connection, we can apply shifts like in inverse iterations to speed up the convergence. *If we choose a good shift on the $k$th iteration, we can speed up the convergence on the last row of $A_{k+1}$,* getting
$A_{k+1}\approx\begin{bmatrix}
{\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {}&{}&{}&{}&{\lambda_{n}}\end{bmatrix}.$
Remember that this convergence is a result of the (unshifted) inverse iteration
$\underbrace{A^{-T}\dots A^{-T}}_{k \text{ times}}=Q_{k}\dots Q_{1}R_{1}^{-1}\dots R_{k}^{-1}.$
To emulate the shifted inverse iteration
$\prod_{i=1}^{k}(A^{T}-s_{i}I)^{-1}=Q_{k}\dots Q_{1}R^{-1}_{1}\dots R^{-1}_{k},$ we need some other definition of $Q_{k},R_{k}$:
> [!algorithm] Shifted QR algorithm
> $[0]$ Initialize with $A_{0}:=A$.
> For $[k=0,\dots]$:
> - Choose the shift $s_{k}$ (e.g. the Rayleigh quotient).
> - Compute the decomposition $A_{k}-s_{k}I=Q_{k}R_{k}$,
> - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}+s_{k}I$.
We can verify the analogous result that $(A-s_{k+1}I)Q^{(k)}=Q^{(k)T}(A_{k+1}-s_{k+1}I)$, so
$\prod_{i=1}^{k}(A-s_{i}I)^{-1}=Q^{(k)}(R^{(k)})^{-1},$
and the algorithm indeed carries out the shifted inverse iteration.
With the shifted algorithm, $A_{k+1}$ and $A_{k}$ are still similar and have the same eigenvalues: since $Q_{k}=(A_{k}-s_{k}I)R_{k}^{-1}$,
$A_{k+1}=R_{k}Q_{k}+s_{k}I=R_{k}(A_{k}-s_{k}I)R_{k}^{-1}+s_{k}I=R_{k}A_{k}R_{k}^{-1}.$
In fact, the iterant $\prod_{k}(A-s_{k}I)^{-1}$ has eigenvalues $\left\{ \prod_{k}(\lambda_{i}-s_{k})^{-1} ~|~i=1,\dots,n\right\}$, so split over the $k$ iterations, the shifted QR implicitly runs power iteration on a matrix with eigenvalues $\mu_{i}:= \left( \prod_{k} \frac{1}{\lambda_{i}-s_{k}}\right)^{1 / k},$which can be huge with good choices of $s_{k}$.
> [!idea] QR algorithm implicitly runs power iterations of a matrix, and good shifts make its eigenvalues huge, speeding up convergence.
In practice, this technique allows the algorithm to terminate in $O(n)$ iterations.
### Preprocessing into Hessenberg Form
> [!bigidea]
> Preprocessing $A$ into its Hessenberg form makes each iteration fast.
Since [[QR Factorisation#QR on Hessenberg Matrices|QR factorization of Hessenberg matrices]] is cheap, we can try to save time by pre-processing the matrix $A$ into its Hessenberg form $H_{A}$:
$\underset{A}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto \underset{H_{1}AH_{1}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto\dots \mapsto \underset{H_{n}\dots H_{1}AH_{1}^{T}\dots H_{n}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &&\ast&\ast&\ast\\ &&&\ast&\ast\end{bmatrix}}=:H_{A}$
where each layer of $H_{i}(\dots)H_{i}^{T}$ removes all column $i