In [[Linear Regression Methods#Least Squares Linear Regression|OLS]], classic [[Inference in OLS|inference techniques]] relies on a few modeling assumptions like a linear model $Y=X\beta+\epsilon,$where $\epsilon$ has a few strong assumptions made about it: - Homoscedasticity: that $\epsilon$ have the same variance $\sigma^{2}$ for different observations; - Normality: that $\epsilon$ follows the Gaussian distributions. If those assumptions fail, the exact, finite distribution results for $\hat{\beta},\hat{\mathbf{y}}$ do not necessarily hold. Instead, we need asymptotic distributions. ### The Sandwich Variance Estimator Without assuming normality, we can still write $\begin{align*} \hat{\beta}-\beta&= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y}-\beta\\ &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\pmb{\epsilon}\\ &= B_{n}^{-1}\pmb{\xi}_{n} \end{align*}$where $B_{n}:=\mathbf{X}^{T}\mathbf{X} / n$, and $\pmb{\xi}_{n}:=\mathbf{X}^{T}\pmb{\epsilon} / n$, and we wish to know the (asymptotic) distribution of $\pmb{\xi}_{n}$. Standard results give $\mathrm{Cov}(\pmb{\xi}_{n})=n^{-2}\mathbf{X}^{T}\mathrm{Cov}(\pmb{\epsilon})\mathbf{X}=M_{n} / n,$where $M_{n}:= \mathbf{X}^{T}\mathrm{Cov}(\pmb{\epsilon})\mathbf{X} / n$, and plugging it back into the first equation gives $\begin{align*} \mathrm{Cov}(\hat{\beta})&= B_{n}^{-1}\mathrm{Cov}(\pmb{\xi}_{n})B_{n}^{-1}\\ &= {B_{n}^{-1}M_{n}B_{n}^{-1}} / n\\ &\propto \text{bread-meat-bread}, \end{align*}$hence the name **sandwich**. Furthermore, if $B_{n},M_{n}$ (being some sort of "average") has finite limits $B,M$ (i.e. they converge term-wise to a finite limit), we still have $\mathrm{Cov}(\hat{\beta})\to \mathbf{0}$, so $\hat{\beta} \to \beta ~\mathrm{a.s.}$. Now for variance estimation, simply replace $B,M$ with their finite-sample estimators: $\begin{align*} \hat{B}&= B_{n},\\ \hat{M}&= \mathbf{X}^{T}\hat{\Omega}\mathbf{X} / n,\\ &\text{where } \hat{\Omega}= \mathrm{diag}(e_{1}^{2},\dots,e_{n}^{2}), \end{align*}$i.e. approximating $\mathrm{Cov}(\pmb{\epsilon})$ with $\mathrm{diag}(\pmb{\epsilon}^{2})$ (using independence) and estimate it with $\mathrm{diag}(\mathbf{e}^{2})$. Plugging in those estimators give the **Eicker-Huber-White covariance matrix** $\hat{\Sigma}_{\mathrm{EHW}}=\hat{B}_{n}^{-1}\hat{M}_{n}\hat{B}_{n}^{-1}=(\mathbf{X}^{T}\mathbf{X})^{-1}(\mathbf{X}^{T}\hat{\Omega}\mathbf{X})(\mathbf{X}^{T}\mathbf{X})^{-1}.$ > [!connection] This can also be derived from an [[Weighed OLS]]. Although $\mathbf{e}$ (usually) underestimates $\pmb{\epsilon}$, we still have $\hat{M}_{n} \to M$ when $n \to \infty$. For finite samples, however, there are a few adjusted estimators that give better results: $\hat{\epsilon}_{i}=\begin{dcases} e_{i}\sqrt{ \frac{n}{n-p} }, \\[0.4em] {e_{i}} / {\sqrt{ 1-h_{ii} }}, \\[0.4em] e_{i} / (1-h_{ii}). \end{dcases}$ ### Asymptotic Normality of $\hat{\beta}$ Furthermore, under mild assumptions (e.g. Lyapunov's condition that $d_{n}:=\sum_{i}\| x_{i}\epsilon_{i} \|^{2+\delta} / n$ is bounded for all $n$ for some $\delta$) that allows the Lindeberg-Feller CLT to apply, we have $\pmb{\xi}_{n}\overset{D}{\approx} N(0, M),$and by extension $\hat{\beta}\overset{D}{\approx}N(\beta,B^{-T}MB^{-1}).$