Density Estimation - Random Notes Go Brrrrrrr

Given a sample $\mathbf{z}$ assumed to be iid. realizations of some distribution $Z \sim f,$and **Lindsey's method** discretizes the data into $K$ bins, regressing the probability density in each bin with a semi-parametric model: Let $\mathcal{Z}$ be the (empirical) support of $f$, and partition it into $K$ equal-lengthed intervals $\mathcal{Z}_{k}$, for $k=1,\dots,K$, and $\mathcal{Z}=\cup_{k}\mathcal{Z}_{k}.$ Now discretize the observations by counting each bin: $y_{k}:= | \{ i ~|~z_{i} \in \mathcal{Z}_{k} \} |,$and Lindsey's method assumes that $\begin{align*} y_{k}\overset{\mathrm{indep.}}{\sim}\mathrm{Po}(\lambda_{k})\\ \lambda_{k}\approx Nd\cdot f(\tilde{z}_{k}), \end{align*}$where $\tilde{z}_{k}$ is the midpoint of $\mathcal{Z}_{k}$, and $d$ is the diameter of the intervals. Therefore, *it reduces the estimation of $f$ to the estimation of $(\lambda_{1},\dots,\lambda_{K})$.* - Non-independence may worsen the variance of the fit, but does not cause much bias. Now we can use standard [[Generalized Linear Models|poisson GLM/GAMs]] to fit $\lambda_{k}$ using the dataset $(\tilde{\mathbf{z}}, \mathbf{y})$. A common choice is a degree-7 polynomial basis $\log \lambda=\sum_{j=1}^{7}\beta_{j}z^{j},$and we can also use GAMs with [[Splines|spline]] bases.