> [!tldr] Weighed OLS > **Weighed OLS** is just the regular OLS model $\mathbf{y} \sim \mathbf{X}$, except that each observation has a different weight $w_{i}$, collected into the diagonal matrix $W:=\mathrm{diag}(w_{1},\dots,w_{n}).$ > By putting different weights on different observations, it suits situations where data is grouped (e.g. students' scores averaged over individual schools), and can favor observations arising from large groups. > > ### Solving the Weighed OLS Coefficient The least squares problem is $\underset{\beta}{\arg\min}~\sum_{i}w_{i}(y_{i}-x_{i}^{T}\beta)^{2}=\underset{\beta}{\arg\min}~\| W^{1 / 2}(\mathbf{y}-\mathbf{X}\beta) \|^{2}. $Writing $\mathbf{y}^{\ast}:=W^{1 / 2}\mathbf{y}$, $\mathbf{X}^{\ast}:= W^{1 / 2}\mathbf{X}$, it reduces to the regular OLS on $\mathbf{y}^{\ast} \sim \mathbf{X}^{\ast}$. Therefore, it yields the coefficient $\begin{align*} \hat{\beta}(W)&= (\mathbf{X}^{T\ast}\mathbf{X}^{\ast})^{-1}\mathbf{X}^{T\ast}\mathbf{y}^{\ast}\\ &= (\mathbf{X}^{T}W\mathbf{X})^{-1}\mathbf{X}^{T}W\mathbf{y}.\end{align*}$ ### Heteroscedasticity A more general usage is for treating heteroscedasticity (grouped data is another case, where larger groups are in general less variable). If $Y_{i}=X_{i}^{T}\beta+\epsilon_{i}$where $\epsilon_{i}\sim [0,\sigma^{2}_{i}]$ are independent across $i$, they are heteroscedastic. But if we can use the weights $w_{i}:=\sigma^{-2}_{i},$the transformed data (with asterisks) will have $Y^{\ast}_{i}=w_{i}^{1 / 2}Y_{i}=X^{T\ast}_{i}\beta+\epsilon_{i}^{\ast},$where $\epsilon_{i}^{\ast}=w^{1 / 2}_{i}\epsilon_{i} \sim [0,1]$ are homescedastic noises. Of course, we don't have knowledge about $\sigma^{2}_{i}$, so instead we need to estimate with things like $w_{i}=e_{i}^{2}$ ($e_{i}$ being the OLS residuals). In these cases, we recover the [[OLS with Heteroscedastic noise#The Sandwich Variance Estimator|the sandwich variance estimator]].