The **probability generating function** (pgf) of a variable $X \in \mathbb{N}$ is $G_{X}(t)=\mathbb{E}[t^{X}].$It is restricted to discrete variables with non-negative integer values, so we want something more general.
## Moment Generating Functions
> [!definition|*] Moment Generating Function
> The **moment generating function** (mgf) of a random variable $X$ is $M_X(t)=\mathbb E[e^{tX}].$
*The Taylor series of the mgf "generates" the moment of the variable*: with $m_k=\mathbb E[X^k]$ being the $m$th **moment** of $X$ (about the point $0$), $M_X(t)=\sum_{k=0}^{\infty} \frac{t^km_k}{k!},$
- Hence $M_X^{(k)}(0)=m_k=\mathbb E[X^k]$.
*For the mgf to converge, $X$ needs an* **exponentially decaying tail**: the "tail" of its distribution that decays as fast as the exponential.
- The normal and the poisson distributions have **super-exponential tails** (i.e. faster-than-exponential).
- The exponential, geometric, and gamma distributions have exponential tails.
- Distributions with polynomial tails do not decay fast enough, hence their mgfs might be undefined at places, if at all.
### Using Moment Generating Functions
> [!tldr]
> Mgfs determine the distribution, so combining variables is equivalent to combining their mgfs
*Uniqueness*: if $X,Y$ have the same mgf, finite over some closed disk around $0$, then they have the same distribution.
*Continuity*: if $(Y_n):M_{Y_n}\to M_Y$, all finite over some closed disk around $0$, then $Y_n\xrightarrow{d}Y$.
*Mgf of sums of random variables*: given iid. $X_{1,\dots,n}$, their sum $Y$ has mgf $M_Y(t)=\prod_n M_{X_n}(t).$If we can recognize $M_{Y}$ as the mgf of some distribution $D$, uniqueness of mgf then gives $Y \sim D$.
> [!examples] Example: sum of two Gaussian RVs
> Independent variables $X_{1}\sim N(\mu_{1},\sigma_{1}^{2})$ and $X_{2} \sim N(\mu_{2},\sigma_{2}^{2})$ have the sum $X_{1}+X_{2}=Y \sim N(\mu_{1}+\mu_{2},\sigma_{1}^{2}+\sigma_{2}^{2}).$
>
> > [!proof]-
> > Note that the standard normal has mgf $e^{t^{2}/ 2}$.
> > The mgf of $X_{i}$ is then $M_{i}=\exp(\mu_{i} t+(\sigma_{i} t)^{2} / 2)$.
> > Their sum $Y$ has mgf $\begin{align*}
> M_{Y}&= M_{1}(t)M_{2}(t)\\
> &= \exp\left( (\mu_{1}+\mu_{2})t+\frac{(\sigma_{1}^{2}+\sigma_{2}^{2})t^{2}}{2} \right),
> \end{align*}$which is the mgf of $N(\mu_{1}+\mu_{2},\sigma_{1}^{2}+\sigma_{2}^{2})$. By the continuity property, $Y \sim N(\mu_{1}+\mu_{2},\sigma_{1}^{2}+\sigma_{2}^{2})$.
> [!proof] Proof of the weak law of large numbers with mgf
> For a sequence of RVs $(X_{i})_{i=1}^{\infty}$, the $n$-term average has mgf $M_{\sum X/n}(t)=\big[M_X(t/n)\big]^n=\big(1+\mu t / n+O((t / n)^{2})\big)^n\to e^{t\mu}.$This is the mgf of the constant $\mu$. By continuity $\sum X/n\xrightarrow{d}\mu$.
> [!proof] Proof of the CLT with mgf
> Define $Y_k=X_k-\mu$, with mgf $M_Y(t)=1+\frac{\sigma^2}{2}t^2 +o(t^3)$ near $0$.
> Then the variable $(\sum X-n\mu)/(\sigma\sqrt{n})$ has mgf $M_n=\Big(M_Y\big(\frac{t}{\sigma\sqrt n}\big)\Big)^n=\big(1+\frac{1}{2n}t^2+O(t^3n^{-3/2})\big)^n\to\exp(t^2/2).$ This is the mgf of $N(0,1)$. Again by continuity, $(\sum X-n\mu)/(\sigma\sqrt{n})\xrightarrow{d} N(0,1).$
*Sum of random number of RVs*: given iid. random variables $X_{1},X_{2},\dots$, and an integer-valued random variable $N$, with pgf $G_{N}$, then the mgf of $Y=X_{1}+\dots+X_{N}$ is $M_{Y}(t)=G_{N}\Big(M_{X_{k}}(t)\Big).$ ^247049
- This can be proved in a similar way as a similar result of pgfs in Prelims. For reference, it is $\text{Theorem }4.8$ on page 41 in the 2021 notes for Prelims probability.
## Characteristic Function
> [!definition|*] Characteristic Function
> The **characteristic function** of a random variable $X$ is$\phi_X(t)=\mathbb E[e^{iXt}].$
The characteristic function can be obtained by replacing $t$ with $it$ in mgfs (where they converge).
- For example, the standard normal has mgf $\exp\left( \frac{t^2}{2} \right)$, hence a characteristic function of $\exp\left( -\frac{t^2}{2} \right)$.
Characteristic functions do not require exponentially decaying tails for convergence as a complex series (unlike the mgf).