Issue: floating point is only accurate up to the machine accuracy $\epsilon$.
Consider studying a problem $f: x \mapsto y$, to be solved numerically.
## Conditioning
Conditioning studies the sensitivity of the problem's solution $y$ to small changes in its input $x$.
> [!definition|*] Conditioning Number
> The **(absolute) condition number** is defined as $\kappa:=\lim_{r \to 0} \sup_{\| \delta x \| \le r }\left\{ \frac{\| \delta f(x) \|}{\| \delta x \| } \right\},$where $\delta f(x):= f(x+\delta x)-f(x)$, i.e. the problem's limit sensitivity to small perturbation in $x$, as the perturbation $\delta x$ goes to 0. Usually, we write $\kappa= \sup_{\delta x}\| \delta f \| / \| \delta x \|$ for simplicity.
>
> The **relative condition number** is $\kappa_{r}:=\lim_{r \to 0} \sup_{\| \delta x \| \le r }\left\{ \frac{\| \delta f \|}{\| f \| } \Big / \frac{\| \delta x \| }{\| x \| } \right\},$the factor by which a relative error in $x$ is blown up.
>
> A problem with a large $\kappa$ or $\kappa_{r}$ is called **ill-conditioned**.
- If we have the partial derivatives $\frac{ \partial f_{i} }{ \partial x_{j} }$, collecting them (evaluated at $x$) into the Jacobian $J(x)$ gives the first-order approximation $\delta f \approx J\delta x$, so $\kappa = \| J(x) \|$.
- Similarly, $\kappa_{r}=\| J(x) \|\cdot(\| x \| / \| f(x) \|)$.
Note that *a problem can have a large $\kappa$ and a small $\kappa_{r}$, and vice versa*:
- For example, computing $f(x):=\sqrt{ x }$ for $x \approx 0$ has $\kappa = O( x^ {-1 / 2})$ while $\kappa_{r} \approx 1/2$.
- For the converse, computing $f(x_{1},x_{2}):=x_{1}-x_{2}$ for $x_{1}\approx x_{2}$ has $\kappa =1$ while $\kappa_{r} = O(| x_{1}-x_{2} |^{-1}) \gg 1$.
### Conditioning Matrix Multiplications
Consider the linear system $Ax=b$, but with disturbances $A(x+\delta x)=b+\delta b.$
If we treat $x$ as the data and compute $b$, the absolute condition number is $\kappa_{x \to b}= \frac{\| \delta b \| }{\| \delta x \| }= \frac{\| A\delta x \| }{\| \delta x \| } \le \| A \|. $
Its relative condition number is $\kappa_{r, x \mapsto b}= \frac{\| \delta b \| }{\| b \| }\Big/ \frac{\| \delta x \| }{\| x \| }= \frac{\| A \delta x \| }{\| b\| }\Big/ \frac{\| \delta x \| }{\| A^{-1}b \| }= \frac{\| A\delta x \| }{\| \delta x \| }\cdot \frac{\| A^{-1}b \| }{\| b \| } \le \| A \|\cdot \| A^{-1} \|. $
Similarly, if we treat $b$ as the data and $x=A^{-1}b$ the result, we have $J=A^{-1}$, so the relative condition number is
$\kappa_{x \mapsto b}=\| A^{-1} \|\cdot \| b \| / \| A^{-1} b\| \le \frac{\| A^{-1} \| }{\sigma_{n}(A^{-1})} =\| A^{-1} \|\cdot \| A \|. $
Therefore, this right hand side motivates the definition:
> [!definition|*] Condition Number of a Matrix
>
> The **condition number** of a matrix $A$ is $\kappa_{2}(A):= \| A \|\cdot \| A^{-1} \|. $
> For non-square matrices, replace $A^{-1}$ with its pseudoinverse.
>
> Under the $l_{2}$ operator norm, this is equivalent to $\kappa_{2}(A)= | \sigma_{1}(A) | / | \sigma_{n}(A) |,$where $\sigma_{1},\sigma_{n}$ are the largest and smallest [[Singular Value Decomposition|singular values]] of $A$.
^af2303
### Other Conditioning
The singular values of a matrix $A$ are well conditioned: we have $\kappa < 1$ for this problem, guaranteed by:
> [!theorem|*] Weyl's Theorem
> For any matrix $A$ and perturbation $E$, we have $\sigma_{i}(A+E) \in (\sigma_{i}(A) \pm \| E \|_{2}). $So in the definition of the absolute condition number $\kappa$, $| \delta \sigma_{i} | / \| \delta A \| \le 1$ for any $\delta A$, so is $\kappa \le 1$, being its limit.
>
> > [!proof]
> > Using [[Singular Value Decomposition#^bed618|Courant-Fisher]], we have $\begin{align*}
> > \sigma_{i}(A+E)&= \max_{\dim \mathcal{S} = i}\min_{x \in \mathcal{S}} \frac{\| (A+E)x \| }{\| x \| }\\
> > \end{align*} $but the inner expression is $\in (\| Ax \| / \| x \| \pm \| Ex \| / \| x \|)$ by triangular inequalities, and replacing $\| Ex \| / \| x \|$ with $\sigma_{1}(E)=\| E \|$ gives a wider, constant interval which can be pulled outside the $\max \min$.
>
> ^6a3c0e
- Since the singular values of symmetric matrices are just absolute values of their eigenvalues, we also have stability of $\lambda's$ of symmetric matrices too. This is not the case for non-symmetric matrices (e.g. $\begin{pmatrix}1 & 1 \\ \epsilon & 1\end{pmatrix}$ has eigenvalues $\lambda=1 \pm \sqrt{ \epsilon}$, which has relative magnitude of $1 / \sqrt{ \epsilon } \gg 1$).
## Stability
In contrast, **stability** studies a particular algorithm $\hat{f}: x \mapsto \hat{y}$ (where $x$ may go through initial rounding, put into some computations and more rounding, and eventually returning a number $\hat{y}$ that is hopefully close to $y=f(x)$).
> [!definition|*] Backward Stability and Errors
> Given an algorithm $\hat{f}$ and input $x$, the **backward error** of a (target) output $y$ is $\Delta x:= \min\{ \delta x ~|~ f(x+\delta x)=\hat{f}(x) \},$i.e. the nearest problem that the algorithm produced an exact solution to.
>
> An algorithm is **backward stable** if it is guaranteed to produce small backward errors.
The **forward stability** is measured by the **forward error** ${\| y-\hat{y} \|}={\| f(x)-f(x+\Delta x) \| }\lesssim \kappa\cdot \| \Delta x \|=\kappa \epsilon,$
where $\Delta x$ is the backward error of $\hat{y}$, and the bound is approximate since $\kappa$ is in the limit $\| \delta x \|\to{0}$, while the machine precision $\epsilon >0$ is just constant.
- Therefore, for an ill-conditioned problem with huge $\kappa$, a backward stable algorithm can't guarantee small forward error.
### Stability Results
Standard [[LU Factorisation]] algorithm has $\frac{\| \hat{L}\hat{U}-A \| }{\| L \|\| U \| }=O(\epsilon),$where $\hat{L},\hat{U}$ are computed factors, and $L,U$ are actual ones.
Vector-vector inner product, and by extension, matrix-vector product are backward stable: $\begin{align*}
fl(x^{T}y)&= (x+\Delta x)^{T}y,\\
fl(Ay)&= (a_{1}+\Delta a_{1},\dots,a_{n}+\Delta a_{n})y=(A+\Delta A)y.
\end{align*}$
However, matrix-matrix product $AB$ is not stable in general, as each column of $B$ in general need different $\Delta A$ to match columns of $fl(AB)$.
> [!theorem|*] Matrix Multiplication Conditioning
> For orthogonal matrices $Q\in \mathbb{R}^{m\times m}$ and orthogonal-invariant norms (like Frobenius and 2-norms), we can bound the forward error with $\| fl(QA)-QA \|= O(\epsilon) \| A \|,$
and the backward error is $\| \delta A \|=O(\epsilon)\| A \|$.
>
> For general matrices $B$, we have the forward error bound $\| fl(AB)-AB \| \le \epsilon \| A \| \| B \|.$
>
>
> > [!proof]- Proof for orthogonal matrices
> >
> >
> > We can form each entry of $QA$ as $q_{i}^{T}a_{j}$ where $q_{i}$ are rows of $Q$ and $a_{j}$ columns of $A$. Their computed versions are $fl(q_{i}^{T}a_{j})=q^{T}_{i}(a_{j}+\delta a_{j})$, where $\| \delta a_{j} \|=O(\epsilon)\| a_{j} \|$. Therefore, the entry-wise forward error is $|fl(q_{i}^{T}a_{j})-q_{i}^{T}a_{j}|=|q_{i}^{T}\delta a_{j}|\le \| q_{i}\| \delta a_{j} \|= O(\epsilon) \| a_{j} \|, $
> > because $q_{i}$ is a unit vector.
> >
> > Therefore, the forward error has Frobenius norm $\| (fl(QA)-QA)_{ij} \|_{F}=O(\epsilon) \| A \| _{F}.$
> > Lastly, the same conclusion hold for other norms that are equivalent with the Frobenius norm, like the operator 2-norm $\| \cdot \|_{2}$.
> >
> > Moreover, the algorithm is backward stable under orthogonal-invariant norms like the Frobenius and 2-norms. The backward error is $Q^{T}(fl(QA)-QA)=: \delta A$, which has norm $\| \delta A \| =\| Q^{T}(fl(QA)-QA )\|=\| fl(QA)-QA \|=O(\epsilon) \| A \|.$
> >
>
> > [!proof]- Proof for general matrices
> > Now for general matrices $B$, we use the SVD $B=UDV^T$ to get $\begin{align*}
> \| fl(AB)-AB \|&= \| fl(AUDV^{T})-AUDV^{T}\| \\
> &= O(\epsilon)\| AUD \| \\
> &\le O(\epsilon) \| A \| \| U \| \| D \| \\
> &= O(\epsilon) \| A \| \| B \| .
> \end{align*} $
>
So if $A$ or $B$ is well-conditioned, e.g. orthogonal, then left or right multiplication by it is forward stable. Dividing both sides by $\| AB \|$ and using $\| AB \|\ge \sigma_{1}(A) \sigma_{\min}(B)$ (and vice versa), the relative forward error is $\frac{\| fl(AB) -AB\|}{\| AB \| }\le \epsilon \min(\kappa_{2}(A), \kappa_{2}(B)).$