Saturated and Null Models - Random Notes Go Brrrrrrr

> [!tldr] > In the context of modeling independent observations $Y_{i} \sim f(\cdot\,;\theta_{i})$, $i=1,\dots,n$, a **saturated model** estimates every $\theta_{i}$ separately, usually with the MLE $\hat{\theta}_{i}^{(s)}:= \arg \max_{\theta_{i}}f(Y_{i};\theta_{i}).$It assumes no relationship whatsoever between different observations — $\hat{\theta}_{i}^{(s)}$ does not depend on $Y_{j \ne i}$ at all. > > The **null model** only models the mean, assumed to be shared by all observations: $\hat{\mu}=\mathrm{avg}(Y_{i})_{i=1}^{n}$, and the estimate $\hat{\theta}^{(0)}_{i}$ (same for all $i$) is derived from $\hat{\mu}$. > > They provide baselines for comparing the log-likelihood of a model in-between, usually assuming some structure shared by different observations. ### Deviance of Null and Saturated Models By [[Deviance#Wilk's Theorem and F-tests|Wilk's theorem]], there is an asymptotic $\chi^{2}$ distribution to the log-likelihood of a model when the sample size $n \to \infty$, assuming the model complexity $\dim \Theta$ ($\Theta$ being the parameter space) is fixed. *This does not hold for the saturated model.* Since the saturated deviance if a model is the likelihood ratio statistic between it and the saturated model, > [!warning] The saturated deviance of a model does not asymptotically converge to $\chi^{2}$ when $n \to \infty$. On the other hand, there is no issue with the null deviance, and it is safe to test models using the $\chi^{2}_{p-1}$ distribution where $p$ is the number of parameters in the model. Also, if the observations are binomial $\mathrm{Binom}(m_{i},p_{i})$, the deviance does converge to $\chi^{2}$ when $m_{i}\to \infty$ for all $i$. ### Null and Saturated Models in GLM Suppose we are fitting a [[Generalized Linear Models|GLM]] with data $(\mathbf{X}_{i},y_{i})_{i=1}^{n}$, distribution $f(\cdot;\theta_{i})$, and link function $g$ such that $g(\mu_{i})=\eta_{i}:=\beta^{T}\mathbf{X}_{i}$where $\beta$ are the coefficients to be estimated, and the entries $\mathbf{X}_{i,0}$ are the constant $1$, so $\beta_{0}$ is the intercept. - If $g$ is the canonical link function, then the model directly models the parameter $\eta_{i}=\theta_{i}$. In this case, the saturated model abandons the GLM structure and simply uses the MLE $\hat{\theta}^{(s)}_{i}=\hat{\theta}_{i}^{\mathrm{(MLE)}}=\arg \max_{\theta_{i}}f(y_{i};\theta_{i}).$ The null model assumes the observations are identically distributed by abandoning non-constant predictors $\mathbf{X}_{i, 1:p}$ with $\beta^{(0)}=(\beta_{0},0,\dots,0)^{T}$. Then $\hat{\theta}_{i}^{(0)}=\eta_{i}=\hat{\beta}^{(0)T}\mathbf{X}_{i}=\hat{\beta}_{0},$giving the same parameter estimate for all $i=1,\dots,n$.