> [!tldr] Krylov Subspace
> The order-$k$ **Krylov subspace** of a matrix $A \in \mathbb{R}^{n\times n}$ and vector $b \in \mathbb{R}^{n}$ is $\mathcal{K}_{k}(A,b):=\mathrm{span}\{ b,Ab, \dots, A^{k-1}b \}.$
>
> Equivalently, it is $\mathcal{K}_{k}(A,b)=\{ p(A)b ~|~p \in \Pi_{k-1} \},$where $\Pi_{k-1}$ is the set of polynomials of degree (up to) $k-1$.
>
> It is widely used to find iterative solutions $\min_{x\in \mathcal{K}_{k}}f(x)$ that approximate $\min_{x}f(x)$ for some objective $f$, e.g. $\| Ax-b \|$.
*Krylov subspaces can be defined using polynomials of $A$, allowing us to study its properties with polynomials:*
$v=w_{0}b+\dots+w_{k-1}A^{k-1}b\in \mathcal{K}_{k}\Longrightarrow v=p(A)b,$
where $p\in \Pi_{k-1}$ (the space of polynomials of degree at most $k-1$) is given by $p(x)=w_{0}+w_{1}x+\dots+w_{k-1}x^{k-1}$. Therefore, $\begin{align*}
\mathcal{K}_{k}(A,b)&= \{ p(A)b ~|~ p \in \Pi_{k-1} \}\\
&= \{ p(A)A x^{\ast} ~|~ p\in \Pi_{k-1} \}\\
&= \{ p(A)x^{\ast} ~|~ p\in \Pi_{k}, p(0)=0 \}.
\end{align*}$ ^d6a17e
## Orthogonal Basis of Krylov Subspaces
Suppose we wish to find an orthogonal basis for $\mathcal{K}_{k}(A,b)$.
The naive idea of forming the matrix $X_{k}:=[b, Ab, \dots, A^{k-1}b]$ and running [[QR Factorisation]] (generate, then orthogonalize) does not work, because $X_{k}$ is very ill-conditioned (due to it being the results of a [[Power Iterations|power iteration]], which converges to an eigenvector of $A$).
> [!exposition]- The condition number of $X_{k}$
>
> Since $\| X_{k}e_{k} \| =\| A^{k-1}b \|=O(| \lambda_{1} |^{k})$, $\sigma_{1}(X_{k})$ grows as fast as $| \lambda_{1} |^{k}$.
>
> On the other hand, using the convergence of $A^{k}b$ to the dominant eigenvector, we have
> $\| X_{k}(0,\dots,-\lambda_{1},1) \|=\| A^{k-1}b-\lambda A^{k-2}b \| =O(| \lambda_{2} |^{k}),$so $\sigma_{k}(X_{k})$ is bounded above by $O(| \lambda_{2} |^{k})$.
>
> As a result, the [[Numerical Stability#Conditioning|condition number]] is $\kappa_{2}(X_{k})= \sigma_{1}(X_{k}) / \sigma_{k}(X_{k}) =O(| \lambda_{1} / \lambda_{2} |^{k}) \to \infty$, and this naive basis becomes extremely ill-conditioned.
### Arnoldi Iterations
However, a similar idea of *orthogonalize while generating* gives the **Arnoldi iterations**:
> [!algorithm] Arnoldi Iterations
> Initialize a matrix $H_{k}=(h_{ij}) \in \mathbb{R}^{(k+1)\times k}$ for storing G-S coefficients.
> Let $q_{1}:= b / \| b \|$.
>
> For $[j=1,\dots, k]$:
> - Compute $v:=Aq_{k}$ to be orthogonalized.
> - Orthogonalize it wrt. previously computed $q_{i}$, for $i=1,\dots,j$:
> - Record $h_{ij}:= v^{T}q_{i}$ as the component in the direction of $q_{i}$.
> - Orthogonalize wrt. $q_{i}$ with $v=v-h_{ij}q_{i}$.
> - Let $h_{(j+1), j}:=\| v \|$, and normalize $q_{j+1}:= v / \| v \|$.
>
> Return the basis $\{ q_{1},\dots,q_{k+1} \}$ and Hessenberg matrix $H_{k}:= (h_{ij})$.
In each iteration, we computed $q_{j+1}:= \frac{1}{h_{(j+1), j}}(Aq_{j}-h_{1j}q_{1}-\dots-h_{jj}q_{j}).~~~~~(\dagger)$
> [!exposition] Matrix Form of the Arnoldi Iterations
> Rearranging $(\dagger)$ gives $Aq_{j}=h_{1j}q_{1}+\dots+h_{(j+1),j}q_{j+1}=Q_{j+1}\begin{pmatrix}
> h_{1j} \\ \vdots \\ h_{(j+1), j}
> \end{pmatrix}.$
>
> So collecting each $j$ into the matrix form, $\bbox[teal, 4px]{AQ_{k}=Q_{k+1}\tilde{H}_{k}},~~\tilde{H}_{k}= \begin{bmatrix}
> H_{k} \\ \begin{array}{}0 & \dots & h_{(k+1),k}\end{array}
> \end{bmatrix}:=\begin{bmatrix}
> h_{11} & h_{12} & \dots & h_{1k} \\
> h_{21} & h_{22} & \dots & h_{2k} \\
> & \ddots & & \vdots \\
> & & h_{k,(k-1)} & h_{kk} \\
> & & & h_{(k+1),k}
> \end{bmatrix}.$
> In this case we can also write $AQ_{k}=Q_{k}H_{k}+q_{k+1}(0,\dots,0,h_{(k+1),k})$, so left applying $Q_{k}^{T}$ gives $H_{k}=Q_{k}^{T}AQ_{k}.$
From $(\dagger)$ we can prove with induction that each $q_{j+1}$ equals $p_{j}(A)b$ for some polynomial $p_{j}$ of order exactly $j$, and hence $\mathrm{span}\{ q_{1},\dots,q_{j+1} \}=\mathrm{span}(b, Ab, \dots, A^{j}b)$.
The Arnoldi iteration implicitly assumes $h_{j+1,j}\ne 0$. If $h_{j+1,j}=0$, it is caused by $Aq_{j} \in \mathrm{span}\{ q_{1},\dots,q_{j} \},$ i.e. $A^{j+1}b \in \mathrm{span}\{ b, \dots, A^{j}b \}$. *For common applications of Krylov subspaces, this is actually good news*; see [[#Finding the Exact Solution]]. ^ae9bca
### Lanczos Iterations (Symmetric $A$)
If $A$ is symmetric, then Arnoldi is also known as **Lanczos iteration**, which produces
$AQ_{k}=Q_{k+1}\tilde{T}_{k},~~\tilde{T}_{k}= \begin{bmatrix}
h_{11} & h_{12} \\
h_{21} & h_{22} & h_{23} & \\
& \ddots & \ddots & \ddots \\
& & h_{k,(k-1)} & h_{kk} \\
& & & h_{(k+1),k}
\end{bmatrix},$
as the Hessenberg ${H}_{k}=Q_{k}^{T}A Q_{k}$ is reduced to a tridiagonal matrix $T_{k}$, which makes up all but the last row of $\tilde{T}_{k}$.
Notably, $(\dagger)$ reduces to $Aq_{j}=h_{j,j-1}q_{j-1}+h_{jj}q_{j}+h_{j,j+1}q_{j+1},$a three-term recurrence relationship (rather than $O(k)$ of Arnoldi).
## Finding the Exact Solution
Krylov-subspace methods in general terminate in at most $n$ steps, This is because for $K\ge n$, the Krylov vectors $\{b,Ab, \dots, A^{K}b\} \subset\mathbb{R}^{n}$ are likely to span $\mathbb{R}^{n}$ (when they are linearly independent; hence $x^{\ast}$ is a linear combination of them).
Otherwise, we arrive at a linearly dependent set $\{ b, \dots, A^{K}b \}$ (guaranteed if $K> n$, but can also happen for $K<n$, i.e. when Arnoldi breaks down with $h_{k+1,k}=0$). As stated earlier, this is good news when we apply Krylov subspaces for solving linear systems.
We have some non-$0$ coefficients $w$ where $\sum_{k=0}^{K} w_{k}A^{k}b=0.$
We can explicitly solve for $x^{\ast}$: if $w_{0}\ne 0$, we can solve for $b$ as $b=-\frac{1}{w_{0}}\sum_{k>0}w_{k}A^{k}b=A\underbrace{\left( -\frac{1}{w_{0}}\sum_{k>0}w_{k}A^{k-1}b \right)}_{=:x^{\ast}}.$
What if $w_{0}=0$? Then we have
$0=\sum_{k=1}^{K}w_{k}A^{k}b=A\sum_{k=0}^{K-1}w_{k+1}A^{k}b.$
As $A \succ 0$, we have a linear dependency at the previous iteration $K-1$, and the Arnoldi/Lanczos iterations would have broken down earlier. Therefore, we can WLOG assume that $K$ is the smallest index at which linear dependence appears, so that $\mathcal{K}_{K}= \mathcal{K}_{K-1}$ and $w_{0}\ne {0}$, reducing to the case above.