> [!tldr] QR Algorithm > The QR algorithm is an iterative algorithm for the [[Eigenvalue Problems|eigenvalue problem]] of a matrix $A$ that uses the [[QR Factorization]]. We assume that $A$ has a basis of eigenvectors $v_{i}$ (all unit-length), corresponding to eigenvalues $\lambda_{i}$, where $\lambda_{i}$ are ordered by magnitude. The vanilla QR algorithm is straightforward: > [!algorithm] QR algorithm > $[0]$ Initialize with $A_{0}:=A$. > For $[k=0,\dots]$ until convergence: > - Compute the decomposition $A_{k}=Q_{k}R_{k}$, > - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}$. > > Return the eigenvalue estimates $\mathrm{diag}(A_{k})$ and eigenvector estimates $Q^{(k)}:= Q_{1}\dots Q_{k}$. Schematically, the algorithm is $\begin{align*} & A_{0}=Q_{0}R_{0}\xrightarrow{\text{swap}} A_{1}:= R_{0}Q_{0}\xrightarrow{\text{factorize}}A_{1}=Q_{1}R_{1}\to \cdots \xrightarrow{\text{converges}}A_{k}\approx(\urcorner). \end{align*}$ The diagonal of $A_{k}$ gives its eigenvalues $\lambda_{1},\dots,\lambda_{n}$, which are also the eigenvalues of $A$ because they are similar: $\begin{align*} A_{k+1}&= R_{k}Q_{k}\\ &= Q_{k}^{-1}A_{k}Q_{k}\\ &= \dots\\ &= (Q_{1}\dots Q_{k})^{-1}A(Q_{1}\dots Q_{k})\\ &= (Q^{(k)})^{-1}AQ^{(k)}. \end{align*}$ ## Convergence of the QR Algorithm Why does the algorithm converge? We shall show that $A^{k}=Q^{(k)}R^{(k)}$ (similarly we define $R^{(k)}:=R_{k}\dots R_{1}$) to be the QR decomposition of $A^{k}$, and we will see that $Q^{(k)}$ converges to something. > [!proof] Proof that $A^{k}=Q^{(k)}R^{(k)}$ > Induction; the base case is trivial. > > Assuming $A^{k}=Q^{(k)}R^{(k)}$, then noting that $Q^{(k)}A_{k+1}=AQ^{(k)}$ by the previous result, $\begin{align*} A^{k+1}&= AQ^{(k)}R^{(k)}\\ &= Q^{(k)}A_{k+1}R^{(k)}\\ &= Q^{(k)}Q_{k+1}R_{k+1}R^{(k)}\\ &= Q^{(k+1)}R^{(k+1)}. \end{align*}$ ### QR as Simultaneous Iterations Consider the factorization $A^{k}=Q^{(k)}R^{(k)}$ as a [[QR Factorization#QR with Gram-Schmidt|Gram-Schmidt process]]: denoting columns of $A^{k}$ as $\{ A^{k}e_{i} \}$ ($e_{i}$ are the columns of $I_{n}$), $Q^{(k)}R^{(k)}=(A^{k}e_{1},..,A^{k}e_{n})$ turns the columns of $A^{k}I$ into $Q^{(k)}$, an orthogonal basis. Therefore, we have the conclusion that > [!connection] The QR algorithm is the [[Power Iterations#Simultaneous Iterations|(simultaneous) power iteration]] under the hood. We know that this simultaneous iteration $A^{k}I \mapsto Q^{(k)}$ gives vectors converging to the eigenvectors of $A$: $Q^{(k)}=(q^{(k)}_{1},\dots,q^{(k)}_{n}) \to (v_{1},\dots,v_{n}),$hence $A_{k+1}$ in $Q^{(k)}A_{k+1}Q^{(k)T}=A$ converges to the diagonalisation of $A$, where the diagonal term reveals the eigenvalues. In particular, the first column is $q_{1}^{(k)}:=Q^{(k)}e_{1}$, and it has convergence $ q_{1}^{(k)}\propto Q^{(k)}R^{(k)}e_{1}=A^{k}e_{1}\to v_{1},$so *the first column of $A_{k}=(Q^{(k)})^{T}AQ^{(k)}$ has convergence $A_{k}e_{1}\to \lambda_{1}e_{1}$ with rate $| \lambda_{1} / \lambda_{2} |$.* ### QR as Inverse Power Iterations If we take the transpose of $A^{-k}$, we have its QR decomposition $(A^{-k})^{T}=Q^{(k)}(R^{(k)})^{-T},$ so implicitly we are also applying an [[Power Iterations#Shifted Inverse Power Iteration|inverse power iteration]] with no shifts (i.e. applying $A^{-T}$ to the vectors $e_{1},\dots$). Since $(R^{(k)})^{-T} \in (\llcorner)$, we have $q^{(k)}_{n}\propto Q^{(k)}(R^{(k)})^{-T}e_{n}=(A^{T})^{-k}e_{n}\to u_{n},$where $u_{n}$ is the least dominant right eigenvector of $A^{T}$, i.e. the left eigenvector of $A$ corresponding to the eigenvalue $\lambda_{n}$. In other words, $(q^{(k)}_{n})^{T}Aq^{(k)}_{n}\to u_{1}^{T}Au_{1}=\lambda_{n},$so *$A_{k}=Q^{(k)T}AQ^{(k)}$ has its last row $e_{n}A_{k}=(q_{n}^{(k)})^{T}AQ^{(k)}\to \lambda_{n}e_{n}$ with linear rate $| \lambda_{n} / \lambda_{n-1}|$*. > [!connection] The QR algorithm is also an inverse power iteration under the hood. ## Accelerating the QR ### Shifts in QR Algorithm > [!bigidea] > Shifts can increase the ratios between eigenvalues, hence giving faster convergence and fewer iterations. Given this connection, we can apply shifts like in inverse iterations to speed up the convergence. *If we choose a good shift on the $k$th iteration, we can speed up the convergence on the last row of $A_{k+1}$,* getting $A_{k+1}\approx\begin{bmatrix} {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {}&{}&{}&{}&{\lambda_{n}}\end{bmatrix}.$ Remember that this convergence is a result of the (unshifted) inverse iteration $\underbrace{A^{-T}\dots A^{-T}}_{k \text{ times}}=Q_{k}\dots Q_{1}R_{1}^{-1}\dots R_{k}^{-1}.$ To emulate the shifted inverse iteration $\prod_{k}(A^{T}-s_{k}I)^{-1}=Q_{k}\dots Q_{1}R^{-1}_{1}\dots R^{-1}_{k},$ we need some other definition of $Q_{k},R_{k}$: > [!algorithm] Shifted QR algorithm > $[0]$ Initialize with $A_{0}:=A$. > For $[k=0,\dots]$: > - Choose the shift $s_{k}$ (e.g. the Rayleigh quotient). > - Compute the decomposition $A_{k}-s_{k}I=Q_{k}R_{k}$, > - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}+s_{k}I$. We can verify the analogous result that $(A-s_{k+1}I)Q^{(k)}=Q^{(k)T}(A_{k+1}-s_{k+1}I)$, so $\prod_{k}(A-s_{k}I)^{-1}=Q^{(k)}(R^{(k)})^{-1},$and the algorithm indeed carries out the shifted inverse iteration. With the shifted algorithm, $A_{k+1}$ and $A_{k}$ are still similar and have the same eigenvalues: since $Q_{k}=(A_{k}-s_{k}I)R_{k}^{-1}$, $A_{k+1}=R_{k}Q_{k}+s_{k}I=R_{k}(A_{k}-s_{k}I)R_{k}^{-1}+s_{k}I=R_{k}A_{k}R_{k}^{-1}.$ In fact, the iterant $\prod_{k}(A-s_{k}I)^{-1}$ has eigenvalues $\left\{ \prod_{k}(\lambda_{i}-s_{k})^{-1} ~|~i=1,\dots,n\right\}$, so split over the $k$ iterations, the shifted QR implicitly runs power iteration on a matrix with eigenvalues $\mu_{i}:= \left( \prod_{k} \frac{1}{\lambda_{i}-s_{k}}\right)^{1 / k},$which can be huge with good choices of $s_{k}$. > [!idea] QR algorithm implicitly runs power iterations of a matrix, and good shifts make its eigenvalues huge, speeding up convergence. In practice, this technique allows the algorithm to terminate in $O(n)$ iterations. ### Preprocessing into Hessenberg Form > [!bigidea] > Preprocessing $A$ into its Hessenberg form makes each iteration fast. Since [[QR Factorization#QR on Hessenberg Matrices|QR factorization of Hessenberg matrices]] is cheap, we can try to save time by pre-processing the matrix $A$ into its Hessenberg form $H_{A}$: $\underset{A}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto \underset{H_{1}AH_{1}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto\dots \mapsto \underset{H_{n}\dots H_{1}AH_{1}^{T}\dots H_{n}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &&\ast&\ast&\ast\\ &&&\ast&\ast\end{bmatrix}}=:H_{A}$ where each layer of $H_{i}(\dots)H_{i}^{T}$ removes all column $is entries under the first subdiagonal. *$H_{A}$ is similar to $A$*, hence applying QR algorithm on $H_{A}$ gives the same eigenvalues. - Note that since $H_{i}$ does not modify the first $i$ rows when left-applied, $H^{T}_{i}$ preserves the first $i$ columns when right-applied, hence preserving the zeroes. - In contrast, if we greedily eliminate the first subdiagonal too (say with $H_{1}$), then we must have modified the first row of $A$ too (in particular $a_{11}$ becomes $\| a_{1} \|$), so right applying $H_{1}^{T}$ yeets the first column of zeros in $H_{1}A$. The Hessenberg structure is preserved by the swapping: if $H_{k}=Q_{k}R_{k}$ is Hessenberg, then so is $H_{k+1}:= R_{k}Q_{k}$. Hence *if we did the preprocessing $A_{0}=H_{A}$ at the start, every iteration of QR will be efficient.* > [!proof] > Since $Q_{k}=H_{k}R_{k}^{-1}$, we have $H_{k+1}=R_{k}H_{k}R_{k}^{-1}$. Now we only show that applying a upper-triangular matrix (on either side) preserves the Hessenberg structure. > > Consider $RH$ where $R$ is upper triangular, and $H$ Hessenberg. Then the $(i,j)$th entry under the first subdiagonal is $\sum_{k}r_{ik}h_{kj}.$ > Due to their structures, for $k< i$, $r_{ik}=0$, and $k > j+1$, $h_{kj}=0$. Moreover, as the entry is under the first subdiagonal, $i > j+1$, so in fact all of the terms are $0$. So $(RH)_{ij}=0$ under the first subdiagonal, and the matrix product is still Hessenberg. > > Now $HR$ being Hessenberg follows from a similar argument, so $RHR^{-1}$ is Hessenberg. In conclusion, a Hessenberg matrix reduces the cost of each iteration from $O(n^{3})$ to $O(n^{2})$. Combined with shifts, we can run the QR algorithm with $O(n^{3})$ complexity. - If $A$ is symmetric, then its Hessenberg form is tridiagonal, so each iteration is $O(n)$ instead, and the entire algorithm is $O(n^{2})$.