Spectral Distribution Function - Random Notes Go Brrrrrrr

> [!abstract] > The **spectral distribution function** $F$ of a stationary discrete process $(X_{t})$ with autocorrelation function $\gamma(k)$ is a non-decreasing function such that $\gamma(k)=\int_{0}^{\pi} \cos(\omega k) \, dF(\omega),$ which is the **spectral representation of the autocovariance** function. Its derivative is the **spectral density function** $f$, which satisfies $\gamma(k)=\int _{0} ^ {\pi} \cos(\omega k)f(\omega) \, d\omega.$ Intuitively, *$F(\omega)$ measures the variance of $(X_{t})$ accounted for by signals of frequencies within $(0, \omega)$*. - Normalizing with $1 / \sigma^{2}_{X}$ gives $F^{\ast}:(0, \pi) \to [0,1]$, measuring the proportion of variance, functioning similar to a cdf. Note that the integral goes up to $\pi$, since in a discrete process, the time interval between observations is at least $1$, so the fastest wave can only alternate like $(1,-1,1,-1,\dots)$, corresponding to a frequency of $\pi$ (two unit time per cycle), also called the [[Nyquist and Fundamental Frequency|Nyquist frequency]]. As a result, $F(\pi)=\sigma^{2}_{X}$ as only signals with frequency $(0, \pi)$ exist in a discrete process. The spectrum $f$ can be expressed as the [[Discrete-Time Fourier Transforms|discrete-time Fourier transform]] of the acv.f $\gamma$: $f(\omega)=\sum_{k=-\infty}^{\infty}\gamma(k)e^{-i \omega k} =\frac{1}{\pi}\left[ \gamma(0) + 2\sum_{k=1}^{\infty}\gamma(k)\cos(\omega k)\right ],$where the second equality follows from $\gamma$ being an even function, so the imaginary $i \sin(\omega k)$ cancels out between $\pm k$. - Note that this series does not converge at $\omega=\omega_{0}$ if there is a deterministic component of $(X_{t})$ with frequency $\omega_{0}$ -- at that frequency $F$ is discontinuous, so its derivative $f$ does not exist. - Normalizing by $\sigma^{2}_{X}$ gives $f^{\ast}={ \partial F^{\ast} } / { \partial \omega }$, which equals the Fourier transform of the [[Autocorrelations|autocorrelation]] function $\rho$ of $(X_{t})$. For continuous processes, a similar identity holds but with the integral being over $(0, \infty)$: $ \begin{align*} \gamma(\tau)&= \int _{0}^{\infty} \cos(\omega \tau)f(\omega) \, d\omega \\ f(\omega) &= \frac{1}{\pi}\int _{-\infty}^{\infty}\gamma(\tau)e^{-i\omega\tau}d\tau= \frac{2}{\pi} \int _{0}^{\infty} \gamma(\tau)\cos(\omega \tau)\, d\tau. \end{align*}$This is because the frequency is no longer limited by the discrete time intervals. ### Spectrum of Selected Processes If $(Z_{t})$ is a [[Purely Random Processes|PRP]], then its autocovariance is $\gamma(t)=\begin{cases} \sigma ^{2}_{Z} & \text{if }t=0, \\ 0 & \text{otherwise}, \end{cases}$hence its spectrum is $f(\omega)=\frac{\gamma(0)}{\pi}=\sigma^{2}_{Z} / \pi$. - For continuous processes, it's impossible to have a uniformly constant spectrum, as the integral $\gamma(\tau)=\int _{\mathbb{R}^{+}} \cos(\omega \tau)f(\omega) \, d\omega$ will not converge. Therefore, it usually suffices to have a spectrum that is constant enough on the frequencies of interest. If $(X_{t})$ is a [[Moving Average Processes|MA(1)]] process satisfying $X_{t}=Z_{t}+\beta Z_{t-1}$ where $(Z_{t})$ is a PRP with unit variance, then it has autocorrelation $\rho(t)=\begin{cases} 1 & \text{if }t=0, \\ \frac{\beta}{1+\beta^{2}} & \text{if }t=1, \\ 0 & \text{otherwise}, \end{cases}$so the normalized spectrum is $f^{\ast}(\omega)=\begin{dcases} \frac{1}{\pi} \left( 1+\frac{2\beta}{1+\beta^{2}}\cos\omega \right)&\text{if }\omega \in (0, \pi) \\[0.4em] 0 & \text{otherwise}, \end{dcases}$and rescaling by $\mathrm{Var}(X_{t})=1+\beta^{2}$ gives the (unnormalized) spectrum. If $(X_{t})$ is an [[Autoregressive Processes|AR(1)]] process satisfying $X_{t}=\alpha X_{t-1}+Z_{t}$, then its autocovariance is $\gamma(k)=\sigma^{2}_{X}\cdot\alpha^{| k |}$, giving the spectrum $\begin{align*} f(\omega)&= \frac{1}{\pi}\sum_{k=-\infty}^\infty \gamma(k)\cdot e^{i\omega k}\\ &= \frac{\sigma^{2}_{X}}{\pi}\left[ 1 + \sum_{k=1}^{\infty}\alpha^{ k }(e^{i\omega k}+e^{-i\omega k}) \right]\\ &= \frac{\sigma^{2}_{X}}{\pi} \left[ 1+\frac{\alpha e^{i\omega}}{1-\alpha e^{i\omega}} + \frac{\alpha e^{-i\omega}}{1-\alpha e^{-i\omega}} \right]\\ &= \frac{\sigma^{2}_{X}}{\pi}\left[ 1 + \frac{2\alpha(\cos \omega-\alpha)}{1-2\alpha \cos \omega + \alpha^{2}} \right]\\ &= \frac{\sigma^{2}_{X}}{\pi}\cdot \frac{1-\alpha^{2}}{1-2\alpha \cos \omega + \alpha^{2}}\\ &= \frac{\sigma^{2}_{Z}}{\pi (1-2\alpha \cos \omega + \alpha^{2})}, \end{align*}$where the last equality comes from $\sigma^{2}_{Z}=(1-\alpha^{2})\sigma^{2}_{X}$. ## ANOVA with Fourier Representations For a time series $(X_{t})$, its total sum of squares is $\mathrm{TSS}:= \sum_{t=1}^{N}(x_{t}-\bar{x})^{2}.$Suppose the time series has Fourier representation $x_{t}=a_{0}+\left( \sum_{p=1}^{N / 2 - 1}[a_{p}\cos(pt\omega_{1})+b_{p}\sin(pt\omega_{1})] \right)+a_{N / 2}\cos\left( \pi t \right), $where $\omega_{1}=2\pi / N$ is the [[Nyquist and Fundamental Frequency|fundamental frequency]], and $p= N / 2$ term has frequency $Nt\omega_{1} / 2=\pi t$, so the $\sin$ term vanishes. It can be orthogonally decomposed in the frequency domain into $\mathrm{TS S}=\frac{N}{2}\sum_{p}(a_{p}^{2}+b_{p}^{2})=\left( \frac{N}{2} \sum_{p=1}^{N / 2}R^{2}_{p} \right)+Na_{N / 2}^{2}$where $R_{p}=\sqrt{ a_{p}^{2}+ b_{p}^{2} }$ is the **magnitude** of the $p$th harmonic. Dividing through by $N$ gives: > [!theorem|*] Parseval's Theorem > For a time series $(X_{t})$ with the above Fourier coefficients $a_{p}, b_{p}$, its mean sum of squares is $\mathrm{MSS}=a_{N / 2}^{2} +\frac{1}{2}\sum_{p=1}^{N / 2 - 1}R^{2}_{p}.$ ## Estimation of the Spectra ### Periodograms The above decomposition suggests plotting $p$ against the contributions of different frequencies $p\omega_{1}$ toward the total variance in the time series. Therefore, a **periodogram** is a histogram where the $p$th bar has width $\omega_{1}$ and height $I(\omega_{p})=\frac{R^2_{p}}{2\omega_{1}}=\frac{NR^{2}_{p}}{4\pi},$so that the bar's area equals its contribution towards the TSS. - It can be thought of as collecting the contributions from frequencies $\omega \in \left[ p\omega_{1} \pm \frac{\omega_{1}}{2} \right]$. - For the contribution from $a_{N / 2}$, there is no contribution beyond $N\omega_{1} / 2$ (the Nyquist frequency), so instead its bar only has width $\omega_{1} / 2$ (frequencies $\omega \in \left[ \frac{N-1}{2}\omega_{1}, \frac{N}{2}\omega_{1} \right]$) and height $I\left( \frac{N}{2}\omega_{1} \right)=I(\pi)=\frac{a_{N / 2}^{2}}{\omega_{1} / 2}=\frac{Na^{2}_{N / 2}}{\pi}.$ Rewriting the coefficients $a_{p}, b_{p}$ in terms of the data, the periodogram heights can be directly computed from the data as $I(p\omega_{1})=\left[ \left( \sum\nolimits_{t}x_{t}\cos(pt\omega_{1}) \right)^{2} + \left( \sum\nolimits_{t}x_{t}\cos(pt\omega_{1}) \right)^{2} \right] \,\,\Big/\,\, N\pi, $which holds for $p=1,\dots, \frac{N}{2}$. The periodogram is the discrete finite Fourier transform of the sample autocovariances $c_{k}:= \frac{1}{N}\sum_{t=1}^{N-k}(x_{t}-\bar{x})(x_{t+k}-\bar{x}),$i.e. $I(p\omega_{1})=\frac{1}{\pi}\sum_{k=-N +1}^{N -1}c_{k}e^{-ip\omega_{1} k}=\frac{1}{\pi}\left[ c_{0}+2\sum_{k=1}^{N-1}c_{k}\cos(p\omega_{1}k) \right].$ In this form, $\mathbb{E}[I(\omega)] \to f(\omega)$ when $N \to \infty$, where $f$ is the spectrum of $(X_{t})$, but $I$ is not a consistent estimator of $f$ because its variance does not converge to $0$ (the number of Fourier coefficients grows linearly with $N$). ### Truncated and Reweighted Periodograms Solutions to $I$ being inconsistent include: - Truncating the number of coefficients $c_{k}$, so that it doesn't grow linearly in $N$. - Giving smaller weights to coefficients corresponding to large lags (i.e. large $k$ in $c_{k}$). This motivates the family of estimators $\hat{f}(\omega)=\frac{1}{\pi}\left[ \lambda_{0}c_{0} + \sum_{k=1}^{M}\lambda_{k}c_{k}\cos(p\omega_{1}k) \right],$where the weights $\{ \lambda_{k} \}$ are the **lag windows**, and $M$ is the **truncation point**. - Common choices of lag windows include the **Tukey window** and **Parzen window**. The choice of $M$ is a matter of bias-variance tradeoff -- larger $M$ reduces bias and might preserve important features, but smaller $M$ reduces variance in the estimation of $c_{k}$ with large $k$. - A common choice is $M= \sqrt{ N }$ or $2\sqrt{ N }$, so that the amount of detail can go to infinity when the sample size does, but $M / N \to 0$. Alternatively, applying Fourier's transform to the lag windows $\{ \lambda_{k} \}$ gives the **spectral window/kernel** given by $\begin{align*} K(\omega)&:= \frac{1}{2\pi} \sum_{k=-\infty}^{\infty}\lambda_{k}\exp(-ik\omega),\\[0.4em] \lambda_{k}&= \int _{-\pi}^{\pi}K(\omega)\cdot e^{i\omega k} \, d\omega. \end{align*}$Substituting this into the estimators give $\begin{align*} \hat{f}(\omega)&= \frac{1}{\pi}\sum_{k=-N+1}^{N-1}\lambda_{k}c_{k}e^{-i\omega k}\\ &= \frac{1}{\pi}\sum_{k=-N+1}^{N-1}\left[ \int _{-\pi}^{\pi}K(v)e^{ivk} \, dv \right]c_{k}e^{-i\omega k}\\ &= \frac{1}{\pi}\int _{-\pi}^{\pi} K(v)\left[ \sum_{k=-N+1}^{N-1}c_{k}e^{i(v-\omega) k} \right]\,dv\\ &= \int _{-\pi}^{\pi}K(v)I(v-\omega) \, dv, \end{align*}$which is just a weighted mean of the (shifted) periodogram $I(v-\omega)$ with weight $K$. This motivates the following discrete estimation. - If $\lambda_{0}:=1$ (which is common practice), then by definition $\int _{-\pi}^{\pi} K(\omega)\, d\omega=1$, in line with the interpretation of it as a weight. ### Moving Average Smoothing Another alternative is just to estimate $f$ not with $I$, but with its moving average with window size $m$: $\hat{f}(\omega)=\frac{1}{m}\sum_{j}I(j\omega_{1}),$where $j$ are $m$ integers such that $\{ j\omega_{1} \}$ are evenly distributed around $\omega$. Since the periodogram estimates $I(j \omega_{1})$ are asymptotically independent for different $j$, this smoothing reduces the variance by a factor of $1 / m$. Although this has bias (unless $f$ is locally linear), it is usually small assuming $f$ is smooth enough (i.e. $f''$ is small). Similar to the choice of $M$ for truncation, choosing the window size $m$ also involves bias-variance tradeoff in an obvious way. - A common choice is again $2 \sqrt{ N }$, so that the pointwise variance is reduced by $O(\sqrt{ N })$. Naive computation of the smoothed periodogram is very costly ($O(N^{2})$), but this is facilitated by the [[Fast Fourier Transform|fast Fourier transform]], which allows for computation in $O\left( N\left( s+\frac{r}{2} \right) \right)$, where $N=sr$ is some factorization of $N$. ### Distribution of Spectra Estimates For a weighted average estimator of the form $\hat{f}(\omega)=\frac{1}{\pi}\sum_{k=-M}^{M}\lambda_{k}c_{k}\cos(k\omega),$where $\lambda_{k},c_{k}$ are assumed to be even in $k$, it has distribution $\frac{\nu\hat{f}}{f}\overset{D}{\approx} \chi^{2}_{\nu},\,\,\,\nu=\frac{2N}{\sum_{k}\lambda^{2}_{k}}.$ This approximation holds well even for moderate datasets, and allows estimation of confidence intervals of $\hat{f}$. For example, the degree of freedom of common estimators are: - Tukey window has $2.67 N / M$. - Parzen window has $3.71N / M$. - Unweighted moving averages (of window size $m$) are just averages of $\chi^{2}_{2}$ variables (degree of freedom is $2$ since ), giving a total degree of freedom of $2m$.