QR Algorithm - Random Notes Go Brrrrrrr

> [!tldr] QR Algorithm > The QR algorithm is an iterative algorithm for the [[Eigenvalue Problems|eigenvalue problem]] of a matrix $A$ that uses the [[QR Factorisation]]. We assume that $A$ has a basis of eigenvectors $v_{i}$ (all unit-length), corresponding to eigenvalues $\lambda_{i}$, where $\lambda_{i}$ are ordered by magnitude. The vanilla QR algorithm is straightforward: > [!algorithm] QR algorithm > $[0]$ Initialize with $A_{0}:=A$. > For $[k=0,\dots]$ until convergence: > - Compute the decomposition $A_{k}=Q_{k}R_{k}$, > - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}$. > > Return the eigenvalue estimates $\mathrm{diag}(A_{k})$ and eigenvector estimates $Q^{(k)}:= Q_{1}\dots Q_{k}$. Schematically, the algorithm is $\begin{align*} & A_{0}=Q_{0}R_{0}\xrightarrow{\text{swap}} A_{1}:= R_{0}Q_{0}\xrightarrow{\text{factorize}}A_{1}=Q_{1}R_{1}\to \cdots \xrightarrow{\text{converges}}A_{k}\approx(\urcorner). \end{align*}$ The diagonal of $A_{k}$ gives its eigenvalues $\lambda_{1},\dots,\lambda_{n}$, which are also the eigenvalues of $A$ because they are similar: $\begin{align*} A_{k+1}&= R_{k}Q_{k}\\ &= Q_{k}^{-1}A_{k}Q_{k}\\ &= \dots\\ &= (Q_{1}\dots Q_{k})^{-1}A(Q_{1}\dots Q_{k})\\ &= (Q^{(k)})^{-1}AQ^{(k)}. \end{align*}$ ## Convergence of the QR Algorithm Why does the algorithm converge? We shall show that $A^{k}=Q^{(k)}R^{(k)}$ (similarly we define $R^{(k)}:=R_{k}\dots R_{1}$) to be the QR decomposition of $A^{k}$, and we will see that $Q^{(k)}$ converges to something. > [!proof] Proof that $A^{k}=Q^{(k)}R^{(k)}$ > Induction; the base case is trivial. > > Assuming $A^{k}=Q^{(k)}R^{(k)}$, then noting that $Q^{(k)}A_{k+1}=AQ^{(k)}$ by the previous result, $\begin{align*} A^{k+1}&= AQ^{(k)}R^{(k)}\\ &= Q^{(k)}A_{k+1}R^{(k)}\\ &= Q^{(k)}Q_{k+1}R_{k+1}R^{(k)}\\ &= Q^{(k+1)}R^{(k+1)}. \end{align*}$ ### QR as Simultaneous Iterations Consider the factorization $A^{k}=Q^{(k)}R^{(k)}$ as a [[QR Factorisation#QR with Gram-Schmidt|Gram-Schmidt process]]: denoting columns of $A^{k}$ as $\{ A^{k}e_{i} \}$ ($e_{i}$ are the columns of $I_{n}$), $Q^{(k)}R^{(k)}=(A^{k}e_{1},..,A^{k}e_{n})$ turns the columns of $A^{k}I$ into $Q^{(k)}$, an orthogonal basis. Therefore, > [!connection] The QR algorithm is the [[Power Iterations#Simultaneous Iterations|(simultaneous) power iteration]] under the hood. We know that this simultaneous iteration $A^{k}I \mapsto Q^{(k)}$ gives vectors converging to the eigenvectors of $A$: $Q^{(k)}=(q^{(k)}_{1},\dots,q^{(k)}_{n}) \to (v_{1},\dots,v_{n}),$hence $A_{k+1}$ in $Q^{(k)}A_{k+1}Q^{(k)T}=A$ converges to the diagonalisation of $A$, where the diagonal term reveals the eigenvalues. In particular, the first column is $q_{1}^{(k)}:=Q^{(k)}e_{1}$, and it has convergence $ q_{1}^{(k)}\propto Q^{(k)}R^{(k)}e_{1}=A^{k}e_{1}\to v_{1},$so *the first column of $A_{k}=(Q^{(k)})^{T}AQ^{(k)}$ has convergence $A_{k}e_{1}\to \lambda_{1}e_{1}$ with rate $| \lambda_{1} / \lambda_{2} |$.* ### QR as Inverse Power Iterations If we take the transpose of $A^{-k}$, we have $(A^{-k})^{T}=Q^{(k)}(R^{(k)})^{-T},$ so implicitly we are also applying an [[Power Iterations#Shifted Inverse Power Iteration|inverse power iteration]] with no shifts (i.e. applying $A^{-T}$ to the vectors $e_{1},\dots$). Since $(R^{(k)})^{-T} \in (\llcorner)$, this is not a QR decomposition, but we still have $q^{(k)}_{n}\propto Q^{(k)}(R^{(k)})^{-T}e_{n}=(A^{T})^{-k}e_{n}\to u_{n},$where $u_{n}$ is the dominant right eigenvector of $A^{-T}$, i.e. the left eigenvector of $A$ corresponding to the eigenvalue $\lambda_{n}$. In other words, $(q^{(k)}_{n})^{T}Aq^{(k)}_{n}\to u_{1}^{T}Au_{1}=\lambda_{n},$so *$A_{k}=Q^{(k)T}AQ^{(k)}$ has its last row $e_{n}A_{k}=(q_{n}^{(k)})^{T}AQ^{(k)}\to \lambda_{n}e_{n}$ with linear rate $| \lambda_{n} / \lambda_{n-1}|$*. Therefore, > [!idea] The QR algorithm is also an inverse power iteration under the hood. ## Accelerating the QR ### Shifts in QR Algorithm > [!idea] Shifts can increase the ratios between eigenvalues, hence giving faster convergence and using fewer iterations. Given this connection, we can apply shifts like in inverse iterations to speed up the convergence. *If we choose a good shift on the $k$th iteration, we can speed up the convergence on the last row of $A_{k+1}$,* getting $A_{k+1}\approx\begin{bmatrix} {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {\ast}&{\ast}&{\ast}&{\ast}&{\ast}\\ {}&{}&{}&{}&{\lambda_{n}}\end{bmatrix}.$ Remember that this convergence is a result of the (unshifted) inverse iteration $\underbrace{A^{-T}\dots A^{-T}}_{k \text{ times}}=Q_{k}\dots Q_{1}R_{1}^{-1}\dots R_{k}^{-1}.$ To emulate the shifted inverse iteration $\prod_{i=1}^{k}(A^{T}-s_{i}I)^{-1}=Q_{k}\dots Q_{1}R^{-1}_{1}\dots R^{-1}_{k},$ we need some other definition of $Q_{k},R_{k}$: > [!algorithm] Shifted QR algorithm > $[0]$ Initialize with $A_{0}:=A$. > For $[k=0,\dots]$: > - Choose the shift $s_{k}$ (e.g. the Rayleigh quotient). > - Compute the decomposition $A_{k}-s_{k}I=Q_{k}R_{k}$, > - Swap the two matrices and define $A_{k+1}:= R_{k}Q_{k}+s_{k}I$. We can verify the analogous result that $(A-s_{k+1}I)Q^{(k)}=Q^{(k)T}(A_{k+1}-s_{k+1}I)$, so $\prod_{i=1}^{k}(A-s_{i}I)^{-1}=Q^{(k)}(R^{(k)})^{-1},$ and the algorithm indeed carries out the shifted inverse iteration. With the shifted algorithm, $A_{k+1}$ and $A_{k}$ are still similar and have the same eigenvalues: since $Q_{k}=(A_{k}-s_{k}I)R_{k}^{-1}$, $A_{k+1}=R_{k}Q_{k}+s_{k}I=R_{k}(A_{k}-s_{k}I)R_{k}^{-1}+s_{k}I=R_{k}A_{k}R_{k}^{-1}.$ In fact, the iterant $\prod_{k}(A-s_{k}I)^{-1}$ has eigenvalues $\left\{ \prod_{k}(\lambda_{i}-s_{k})^{-1} ~|~i=1,\dots,n\right\}$, so split over the $k$ iterations, the shifted QR implicitly runs power iteration on a matrix with eigenvalues $\mu_{i}:= \left( \prod_{k} \frac{1}{\lambda_{i}-s_{k}}\right)^{1 / k},$which can be huge with good choices of $s_{k}$. > [!idea] QR algorithm implicitly runs power iterations of a matrix, and good shifts make its eigenvalues huge, speeding up convergence. In practice, this technique allows the algorithm to terminate in $O(n)$ iterations. ### Preprocessing into Hessenberg Form > [!bigidea] > Preprocessing $A$ into its Hessenberg form makes each iteration fast. Since [[QR Factorisation#QR on Hessenberg Matrices|QR factorization of Hessenberg matrices]] is cheap, we can try to save time by pre-processing the matrix $A$ into its Hessenberg form $H_{A}$: $\underset{A}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto \underset{H_{1}AH_{1}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\end{bmatrix}} \mapsto\dots \mapsto \underset{H_{n}\dots H_{1}AH_{1}^{T}\dots H_{n}^{T}}{\begin{bmatrix}\ast&\ast&\ast&\ast&\ast\\ \ast&\ast&\ast&\ast&\ast\\ &\ast&\ast&\ast&\ast\\ &&\ast&\ast&\ast\\ &&&\ast&\ast\end{bmatrix}}=:H_{A}$ where each layer of $H_{i}(\dots)H_{i}^{T}$ removes all column $is entries under the first subdiagonal. *$H_{A}$ is similar to $A$*, hence applying QR algorithm on $H_{A}$ gives the same eigenvalues. - Note that since $H_{i}$ does not modify the first $i$ rows when left-applied, $H^{T}_{i}$ preserves the first $i$ columns when right-applied, hence preserving the zeroes. - In contrast, if we greedily eliminate the first subdiagonal too (say with $H_{1}$), then we must have modified the first row of $A$ too (in particular $a_{11}$ becomes $\| a_{1} \|$), so right applying $H_{1}^{T}$ yeets the first column of zeros in $H_{1}A$. The Hessenberg structure is preserved by the swapping: if $H_{k}=Q_{k}R_{k}$ is Hessenberg, then so is $H_{k+1}:= R_{k}Q_{k}$. Hence *if we did the preprocessing $A_{0}=H_{A}$ at the start, every iteration of QR will be efficient.* > [!proof] > Since $Q_{k}=H_{k}R_{k}^{-1}$, we have $H_{k+1}=R_{k}H_{k}R_{k}^{-1}$. Now we only need to show that applying a upper-triangular matrix (on either side) preserves the Hessenberg structure. > > Consider $RH$ where $R$ is upper triangular, and $H$ Hessenberg. Then the $(i,j)$th entry of $RH$ is $(RH)_{ij}=\sum_{k}r_{ik}h_{kj}.$ > Due to their structures, $r_{ik}\ne 0$ only when $k\ge i$, and $h_{kj}\ne 0$ only when $k\le j+1$. So for the sum to be non-0, we need at least one overlapping non-0 term, i.e. $\{ k: i\le k \le j+1 \} \ne \varnothing \iff i \le j+1.$ > > So $(RH)_{ij}=0$ under the first subdiagonal, and the matrix product is still Hessenberg. Now $HR$ being Hessenberg follows from a similar argument, so $RHR^{-1}$ is also Hessenberg. In conclusion, a Hessenberg matrix reduces the cost of each iteration from $O(n^{3})$ to $O(n^{2})$. Combined with shifts, we can run the QR algorithm with $O(n^{3})$ complexity. - If $A$ is symmetric, then its Hessenberg form is tridiagonal, so each iteration is $O(n)$ instead, and the entire algorithm is $O(n^{2})$.