Krylov Subspace - Random Notes Go Brrrrrrr

> [!tldr] Krylov Subspace > The order-$k$ **Krylov subspace** of a matrix $A \in \mathbb{R}^{n\times n}$ and vector $b \in \mathbb{R}^{n}$ is $\mathcal{K}_{k}(A,b):=\mathrm{span}\{ b,Ab, \dots, A^{k-1}b \}.$ > > Equivalently, it is $\mathcal{K}_{k}(A,b)=\{ p(A)b ~|~p \in \Pi_{k-1} \},$where $\Pi_{k-1}$ is the set of polynomials of degree (up to) $k-1$. > > It is widely used to find iterative solutions $\min_{x\in \mathcal{K}_{k}}f(x)$ that approximate $\min_{x}f(x)$ for some objective $f$, e.g. $\| Ax-b \|$. *Krylov subspaces can be defined using polynomials of $A$, allowing us to study its properties with polynomials:* $v=w_{0}b+\dots+w_{k-1}A^{k-1}b\in \mathcal{K}_{k}\Longrightarrow v=p(A)b,$ where $p\in \Pi_{k-1}$ (the space of polynomials of degree at most $k-1$) is given by $p(x)=w_{0}+w_{1}x+\dots+w_{k-1}x^{k-1}$. Therefore, $\begin{align*} \mathcal{K}_{k}(A,b)&= \{ p(A)b ~|~ p \in \Pi_{k-1} \}\\ &= \{ p(A)A x^{\ast} ~|~ p\in \Pi_{k-1} \}\\ &= \{ p(A)x^{\ast} ~|~ p\in \Pi_{k}, p(0)=0 \}. \end{align*}$ ^d6a17e ## Orthogonal Basis of Krylov Subspaces Suppose we wish to find an orthogonal basis for $\mathcal{K}_{k}(A,b)$. The naive idea of forming the matrix $X_{k}:=[b, Ab, \dots, A^{k-1}b]$ and running [[QR Factorisation]] (generate, then orthogonalize) does not work, because $X_{k}$ is very ill-conditioned (due to it being the results of a [[Power Iterations|power iteration]], which converges to an eigenvector of $A$). > [!exposition]- The condition number of $X_{k}$ > > Since $\| X_{k}e_{k} \| =\| A^{k-1}b \|=O(| \lambda_{1} |^{k})$, $\sigma_{1}(X_{k})$ grows as fast as $| \lambda_{1} |^{k}$. > > On the other hand, using the convergence of $A^{k}b$ to the dominant eigenvector, we have > $\| X_{k}(0,\dots,-\lambda_{1},1) \|=\| A^{k-1}b-\lambda A^{k-2}b \| =O(| \lambda_{2} |^{k}),$so $\sigma_{k}(X_{k})$ is bounded above by $O(| \lambda_{2} |^{k})$. > > As a result, the [[Numerical Stability#Conditioning|condition number]] is $\kappa_{2}(X_{k})= \sigma_{1}(X_{k}) / \sigma_{k}(X_{k}) =O(| \lambda_{1} / \lambda_{2} |^{k}) \to \infty$, and this naive basis becomes extremely ill-conditioned. ### Arnoldi Iterations However, a similar idea of *orthogonalize while generating* gives the **Arnoldi iterations**: > [!algorithm] Arnoldi Iterations > Initialize a matrix $H_{k}=(h_{ij}) \in \mathbb{R}^{(k+1)\times k}$ for storing G-S coefficients. > Let $q_{1}:= b / \| b \|$. > > For $[j=1,\dots, k]$: > - Compute $v:=Aq_{k}$ to be orthogonalized. > - Orthogonalize it wrt. previously computed $q_{i}$, for $i=1,\dots,j$: > - Record $h_{ij}:= v^{T}q_{i}$ as the component in the direction of $q_{i}$. > - Orthogonalize wrt. $q_{i}$ with $v=v-h_{ij}q_{i}$. > - Let $h_{(j+1), j}:=\| v \|$, and normalize $q_{j+1}:= v / \| v \|$. > > Return the basis $\{ q_{1},\dots,q_{k+1} \}$ and Hessenberg matrix $H_{k}:= (h_{ij})$. In each iteration, we computed $q_{j+1}:= \frac{1}{h_{(j+1), j}}(Aq_{j}-h_{1j}q_{1}-\dots-h_{jj}q_{j}).~~~~~(\dagger)$ > [!exposition] Matrix Form of the Arnoldi Iterations > Rearranging $(\dagger)$ gives $Aq_{j}=h_{1j}q_{1}+\dots+h_{(j+1),j}q_{j+1}=Q_{j+1}\begin{pmatrix} > h_{1j} \\ \vdots \\ h_{(j+1), j} > \end{pmatrix}.$ > > So collecting each $j$ into the matrix form, $\bbox[teal, 4px]{AQ_{k}=Q_{k+1}\tilde{H}_{k}},~~\tilde{H}_{k}= \begin{bmatrix} > H_{k} \\ \begin{array}{}0 & \dots & h_{(k+1),k}\end{array} > \end{bmatrix}:=\begin{bmatrix} > h_{11} & h_{12} & \dots & h_{1k} \\ > h_{21} & h_{22} & \dots & h_{2k} \\ > & \ddots & & \vdots \\ > & & h_{k,(k-1)} & h_{kk} \\ > & & & h_{(k+1),k} > \end{bmatrix}.$ > In this case we can also write $AQ_{k}=Q_{k}H_{k}+q_{k+1}(0,\dots,0,h_{(k+1),k})$, so left applying $Q_{k}^{T}$ gives $H_{k}=Q_{k}^{T}AQ_{k}.$ From $(\dagger)$ we can prove with induction that each $q_{j+1}$ equals $p_{j}(A)b$ for some polynomial $p_{j}$ of order exactly $j$, and hence $\mathrm{span}\{ q_{1},\dots,q_{j+1} \}=\mathrm{span}(b, Ab, \dots, A^{j}b)$. The Arnoldi iteration implicitly assumes $h_{j+1,j}\ne 0$. If $h_{j+1,j}=0$, it is caused by $Aq_{j} \in \mathrm{span}\{ q_{1},\dots,q_{j} \},$ i.e. $A^{j+1}b \in \mathrm{span}\{ b, \dots, A^{j}b \}$. *For common applications of Krylov subspaces, this is actually good news*; see [[#Finding the Exact Solution]]. ^ae9bca ### Lanczos Iterations (Symmetric $A$) If $A$ is symmetric, then Arnoldi is also known as **Lanczos iteration**, which produces $AQ_{k}=Q_{k+1}\tilde{T}_{k},~~\tilde{T}_{k}= \begin{bmatrix} h_{11} & h_{12} \\ h_{21} & h_{22} & h_{23} & \\ & \ddots & \ddots & \ddots \\ & & h_{k,(k-1)} & h_{kk} \\ & & & h_{(k+1),k} \end{bmatrix},$ as the Hessenberg ${H}_{k}=Q_{k}^{T}A Q_{k}$ is reduced to a tridiagonal matrix $T_{k}$, which makes up all but the last row of $\tilde{T}_{k}$. Notably, $(\dagger)$ reduces to $Aq_{j}=h_{j,j-1}q_{j-1}+h_{jj}q_{j}+h_{j,j+1}q_{j+1},$a three-term recurrence relationship (rather than $O(k)$ of Arnoldi). ## Finding the Exact Solution Krylov-subspace methods in general terminate in at most $n$ steps, This is because for $K\ge n$, the Krylov vectors $\{b,Ab, \dots, A^{K}b\} \subset\mathbb{R}^{n}$ are likely to span $\mathbb{R}^{n}$ (when they are linearly independent; hence $x^{\ast}$ is a linear combination of them). Otherwise, we arrive at a linearly dependent set $\{ b, \dots, A^{K}b \}$ (guaranteed if $K> n$, but can also happen for $K<n$, i.e. when Arnoldi breaks down with $h_{k+1,k}=0$). As stated earlier, this is good news when we apply Krylov subspaces for solving linear systems. We have some non-$0$ coefficients $w$ where $\sum_{k=0}^{K} w_{k}A^{k}b=0.$ We can explicitly solve for $x^{\ast}$: if $w_{0}\ne 0$, we can solve for $b$ as $b=-\frac{1}{w_{0}}\sum_{k>0}w_{k}A^{k}b=A\underbrace{\left( -\frac{1}{w_{0}}\sum_{k>0}w_{k}A^{k-1}b \right)}_{=:x^{\ast}}.$ What if $w_{0}=0$? Then we have $0=\sum_{k=1}^{K}w_{k}A^{k}b=A\sum_{k=0}^{K-1}w_{k+1}A^{k}b.$ As $A \succ 0$, we have a linear dependency at the previous iteration $K-1$, and the Arnoldi/Lanczos iterations would have broken down earlier. Therefore, we can WLOG assume that $K$ is the smallest index at which linear dependence appears, so that $\mathcal{K}_{K}= \mathcal{K}_{K-1}$ and $w_{0}\ne {0}$, reducing to the case above.