Measure Theory - Random Notes Go Brrrrrrr

### Topics ![[NotesWithDomain.base]] ### Snippets ![[NotesWithDomain.base]] ## Measure Spaces > [!definition|*] Algebras > > On a set $\Omega$, $\mathcal{F} \subseteq \mathcal{P}(\Omega)$ is an **algebra** if: > - $\Omega \in\mathcal{F}$, > - $\mathcal{F}$ is stable under complements and finite unions. > > It is a **σ-algebra** if it is also stable under countable unions. Given a measurable space $(\Omega , \mathcal{F})$, they form a **measure space** when equipped with a [[Set Functions|σ-additive set function]] $\mu: \mathcal{F} \to [0,\infty]$, and in this context $\mu$ is a **[[Measures|measure]]**. ### Measure Extension from Algebra and $\pi$-Systems $\sigma$-algebras are usually hard to characterise directly, so [[π-Systems|π-systems]] and algebras provide workable interfaces when proving results for $\sigma$-algebras. To do so, we need to justify that: - Uniqueness: that a characterization on a $\pi$-system is enough to uniquely determine the extension, if there is one. - Existence: that such an extension actually exists. > [!theorem|*] Uniqueness of extension from $\pi$-systems > If *finite* measures $\mu_{0},\mu_{1}$ on $(\Omega,\mathcal{F})$ have the same total mass $\mu_{0}(\Omega)=\mu_{1}(\Omega) < \infty$ and agree on a generating $\pi$-system $\mathcal{D} \subseteq \mathcal{F}$ (so $\sigma(\mathcal{D})=\mathcal{F}$), then $\mu_{0},\mu_{1}$ must agree on $\mathcal{F}$. > - Therefore, if a $\sigma$-additive set function $\mu|_{\mathcal{A}}$ has a finite extension to $\sigma(\mathcal{A})$, this extension must be unique. > > > [!warning] Counterexample for infinite measures > > The Lebesgue measure and counting measure agree on the set $\{ (-\infty,a) ~|~ a \in \mathbb{R} \}$, but they disagree on the Borel sets of $\mathbb{R}$. > > > [!proof]- > > Consider the sets $\mathcal{E}:=\{ A \in \mathcal{F} ~|~\mu_{0}(A)=\mu_{1}(A) \}$ on which the measures agree. Obviously $\mathcal{D} \subset \mathcal{E}$, so if we can show $\mathcal{E}$ is a $\lambda$-system, we can apply $\pi$-$\lambda$ system lemma to conclude that $\mathcal{E}=\sigma(\mathcal{D})=\mathcal{F}$. > > > > Now check the definition of a $\lambda$-system: > > - By assumption $\Omega \in \mathcal{E}$. > > - Set differences $B-A$ (for $A,B \in \mathcal{E}$) are also in $\mathcal{E}$ by additivity. > > - Countable rising unions $\cup_{i}A_{i}$ where $A_{i} \subseteq A_{i+1}$: consider the differences $B_{i}:= A_{i+1}-A_{i} \in \mathcal{E}$. $\mu_{0,1}$ agree on those, so countable additivity gives $\mu_{0}(\cup_{i}B_{i})=\sum_{i}\mu_{0}(B_{i})=\sum_{i}\mu_{1}(B_{i})=\mu_{1}(\cup_{i}B_{i}),$and $\cup_{i}B_{i}=\cup_{i}A_{i}$. Counter example without finiteness assumption: $\mu_{1}\equiv \infty$ and $\mu_{2}=\mathrm{Leb}$ agree on the $\pi$-system $\{ (-\infty, a) ~|~a \in \mathbb{R} \},$but they don't agree on the Borel sets $\mathcal{B}(\mathbb{R})$ generated by that system. ^33cbad > [!theorem|*] Caratheodory's Theorem: existence of extensions from algebras > > On a set $\Omega$, a $\sigma$-additive set function $\mu_{0}$ on an *algebra* $\mathcal{F}_{0}$ can be extended to a measure $\mu$ on $\sigma(\mathcal{F}_{0})$. > - As a corollary, if $\mu_{0}(\Omega) < \infty$, then the extension $\mu$ is unique. ## Measurability A function $f:(\Omega, \mathcal{F}) \to (E,\mathcal{E})$ is $\mathcal{F}$-**measurable** if $\forall B \in \mathcal{E},~ f^{-1}(B) \in \mathcal{F}$That is, all of $\mathcal{E}$ has a preimage in $\mathcal{F}$. The $\sigma$**-algebra generated by a family of functions** $\{ f_{i}:\mathcal{F} \to \mathcal{E}~|~ i \in I \}$ is denoted $\sigma(f_{i} ~|~ i \in I)$, the smallest $\sigma$-algebra in $\mathcal{F}$ where all of $f_{i}$ are measurable. In particular, $\sigma(f_{i})=\sigma(\{ f_{i}^{-1}(B)~|~ i \in I,B \in \mathcal{E} \})$which is the $\sigma$-algebra generated by preimages from $\mathcal{E}$. ### Criteria for Measurability A function $f: \mathcal{F} \to \mathcal{E}$ is $\mathcal{F}$-measurable if a generating subset $\mathcal{C} \subseteq \mathcal{E} \text{ s.t. }\sigma(\mathcal{C})=\mathcal{E}$ has a preimage in $\mathcal{F}$: $\forall B \in \mathcal{C},~ f^{-1}(B) \in \mathcal{F}$ > [!proof]- > The set $\mathcal{D}:=\{ B \in \mathcal{E}~|~ f^{-1}(B) \in \mathcal{F} \}$is a $\sigma$-algebra, as preimages preserve set operations. Then as $\mathcal{C} \subset \mathcal{D}$, we have $\mathcal{E} = \sigma(\mathcal{C}) \subseteq \mathcal{D}$. Double inclusion follows from definition of $\mathcal{D}$. - As a corollary, a function $f:(\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ is measurable if $\forall x \in \mathbb{R}, f^{-1}(-\infty, x) \in \mathcal{F}$. If a sequence of functions $(f_{n})$ are all measurable, then the following are measurable as well: $\sup_{n \in \mathbb{N}}f_{n},~\inf_{n \in \mathbb{N}} f_{n},~ \limsup_{n \to \infty} f_{n},~ \liminf_{n \to \infty} f_{n} $ ### Random Variables In a probabilistic context, measurable functions $f:(\Omega, \mathcal{F}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ are called **random variables**. - For example, coin-tossing can be modeled with $\Omega=\{ H,T \}^\mathbb{N}$, and $\mathcal{F}=\sigma(\{ \underset{\text{all sequences such that the $n$th toss is a $C$}}{\omega:\omega_{n}=C \}~|~ C=H\text{ or }T, n \in \mathbb{N}})$and taking the first toss $X: \omega \mapsto \omega_{1}$ is a random variable. If a random variable $X$ is on $(\Omega, \mathcal{F}, \mathbb{P}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$, its **law** or **image measure** is $\Lambda_{X}: \mathcal{B}(\mathbb{R}) \to \mathbb{R},~ B \mapsto \mathbb{P}[X^{-1}(B)]= \mathbb{P}[ X \in B]$which is a probability measure on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$. The law $\Lambda_{X}$ defines the **distribution function** $F_{X}$ of $X$: $F_{X}: \mathbb{R} \to \mathbb{R}, ~ c \mapsto \Lambda_{X}(-\infty, c]=\mathbb{P}[X \le c].$ - Distributions have the usual properties expected: non-decreasing, goes from $0$ to $1$, and is right-continuous (which follows from continuity of measures, in this case $\mathbb{P}$, from above). Conversely, given a function $F$ with the three properties of a distribution function, it's possible to define variables $X_{\pm}:(\mathbb{R}, \mathcal{B}(\mathbb{R}),\mathrm{Leb}) \to \mathbb{R}$ that have this distribution: $\begin{align*} X_{+}(w):&= \sup \{ x \in \mathbb{R} ~|~ F(x) \le w \}\\ X_{-}(w):&= \sup \{ x \in \mathbb{R} ~|~ F(x) \lneq w \} \end{align*}$ - Note that $X_{+}$ can also be defined as $\inf \{ x ~|~ F(x) > w \}=:F^{-1}$, the **quantile** of $F$. > [!info]- Deriving the distribution of $X_{-}$ > $X_{-}$ has distribution $F$ since $\begin{align*} w &\in X_{-}^{-1}(-\infty, a]\\ &\iff X_{-}(w) \le a\\&\iff w\le F(a) \end{align*}.$The last equivalence follows from: > - $w \le F(a) \Rightarrow X_{-}(w) \le a$ since $F$ is non-decreasing, > - For the other direction, note that $z > X_{-}(w)$ implies $F(z) \ge w$ by definition of $X_{-}$, and letting $z \downarrow X_{-}(w)$ gives $F(X_{-}(w)) \ge w$ by right continuity. Now $F(a) \ge F(X_{-}(w)) \ge w$ gives the other direction. > > Hence $X_{-}^{-1}(-\infty,a]=[0,F(a)]$, and $\mathbb{P}[X_{-} \le a]=\mathrm{Leb}[0,F(a)]=F(a)$. > [!math|{"type":"lemma","number":"","setAsNoteMathLink":false,"title":"$\\sigma$-algebra of Random Variables","label":"sigma-algebra-of-random-variables"}] Lemma ($\sigma$-algebra of Random Variables). > If $X_{1},\dots, X_{k}$ are random variables $(\Omega, \mathcal{F},\mathbb{P}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$, then their $\sigma$-algebra is generated by the $\pi$-system $\begin{align*} \mathcal{A}&:= \{ \{ X_{1}\le x_{1},\dots,X_{n} \le x_{n} \}\,|\, x_{1},\dots,x_{n} \in \bar{\mathbb{R}} \}\\ &= \left\{ \bigcap_{i=1}^{n}X_{i}^{-1}(\infty, x_{i}] \,|\, x_{1},\dots,x_{n} \in \bar{\mathbb{R}} \right\} \end{align*}$ ## Independence > [!definition|*] Independence of $\sigma$-algebras > $\sigma$-algebras $\mathcal{G}_{1},\dots,\subseteq \mathcal{F}$ are **independent** if for any finite number of events $E_{i_{1}},\dots,E_{i_{n}}$ drawn from distinct $\mathcal{G}_{i_{1}},\dots,\mathcal{G}_{i_{n}}$, there is $\mathbb{P}[E_{i_{1}} \cap \dots \cap E_{i_{n}}]= \prod_{j=1}^{n}\mathbb{P}[E_{i_{j}}].$ - Hence an infinite collection $(\mathcal{G}_{i})$ are independent if and only if all its finite sub-collections are independent. Random variables $X_{1},\dots$ are independent if $\sigma(X_{1}),\dots$ are independent. Events $E_{1},\dots$ are independent if $\mathcal{E_{1}},\dots$ are independent, where $\mathcal{E}_{i}:=\{ \emptyset, E_{i}, E_{i}^{c}, \Omega \}$. For simplicity, for $\mathcal{S}_{1},\dots \subseteq \mathcal{F}$, write $\mathrm{Indep}\{ \mathcal{S}_{1},\dots \}$ if the collection satisfy the independence requirement (even if they are not $\sigma$-algebras), that is $\forall S_{i_{1}} \in \mathcal{S}_{i_{1}},\dots, S_{i_{n}} \in \mathcal{S_{i_{n}}},\,\, \mathbb{P}\left[ \bigcap_{j=1}^{n}S_{i_{j}} \right]=\prod_{j=1}^{n}\mathbb{P}[S_{i_{j}}]$*This is not standard notation.* ### Deducing Independence > [!theorem|*] Independence from Generating $\pi$-systems > > Independence of $\sigma$-algebras can be extended from independence between generating $\pi$-systems: > - If $\mathcal{G},\mathcal{H} \subseteq \mathcal{F}$ are $\sigma$-algebras, and $\mathcal{I},\mathcal{J}$ are $\pi$-systems generating them, then $\mathrm{Indep}\{ \mathcal{G}, \mathcal{H} \} \iff \mathrm{Indep}\{ \mathcal{I}, \mathcal{J} \}.$ > > - Similarly, if $\mathcal{G}_{1},\dots,\mathcal{G_{n}}$ are generated by $\pi$-systems $\mathcal{A}_{1},\dots,\mathcal{A}_{n}$, then $\mathrm{Indep}\{ \mathcal{G}_{1}, \dots,\mathcal{G}_{n} \} \iff \mathrm{Indep}\{ \mathcal{A}_{1},\dots, \mathcal{A}_{n} \}.$ > > > [!proof]- > > $(\Rightarrow)$ is trivial; $(\Leftarrow)$ is as follows. > > Fix $A_{2}\in \mathcal{A}_{2},\dots,A_{n} \in \mathcal{A}_{n}$. Then the two measures on $\mathcal{G}_{1}$ $\begin{align*} G &\mapsto \mathbb{P}[G \cap A_{2} \cap\dots \cap A_{n}] ~\text{ and }\\ G &\mapsto \mathbb{P}[G] \cdot \mathbb{P}[A_{2}]\cdots \mathbb{P}[A_{n}] \end{align*}$agree on $\mathcal{A}_{1}$ and have the same total mass $\mathbb{P}[A_{2} \cap \dots \cap A_{n}] < \infty$, so they extend to the same measure on $\mathcal{G}_{1}$, and we have $\mathrm{Indep}\{ \mathcal{G}_{1}, \mathcal{A}_{2},\dots,\mathcal{A}_{n} \}$. > > - Alternatively, Consider the set $\mathcal{M}_{1} \subseteq \mathcal{G}_{1}$ on which $\mathbb{P}[\ast \cap A_{2} \cap\dots \cap A_{n}] = \mathbb{P}[\ast] \cdot \mathbb{P}[A_{2}]\cdots \mathbb{P}[A_{n}]$ holds. Now show it is a $\lambda$-system, so by $\pi$-$\lambda$ system lemma, $\mathcal{M}_{1}=\mathcal{G}_{1}$. > > > > Now do the same for $\mathcal{A}_{2}$ by fixing $G_{1} \in \mathcal{G}_{1},A_{3} \in \mathcal{A}_{3},\dots, A_{n} \in \mathcal{A}_{n}$. It gives $\mathrm{Indep}\{ \mathcal{G}_{1},\mathcal{G}_{2},\mathcal{A}_{3},\dots,\mathcal{A}_{n} \}$. > > > > Continuing the induction will give $\mathrm{Indep}\{ \mathcal{G}_{1}, \dots,\mathcal{G}_{n} \}$ as desired. An important corollary is that *random variables $X_{1},\dots,X_{n}:(\Omega, \mathcal{F}, \mathbb{P}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ are independent if their joint distribution function can be factored into individual ones*, i.e. for any $x_{1},\dots, x_{n} \in \mathbb{R}$, there is $\mathbb{P}[X_{1}\le x_{1};\dots; X_{n} \le x_{n}]=\prod_{i=1}^{n}\mathbb{P}[X_{i} \le x_{i}].$ > [!proof]- > Essentially, it is because the events $\{ X_{i} \le x \}$ are the preimages $\mathcal{A}_{i}:= \{ X_{i}^{-1}(-\infty, x] ~|~ x \in \mathbb{R} \}$, and $\{ (-\infty, x] ~|~ x \in \mathbb{R}\}$ is a $\pi$-system generating $\mathcal{B}(\mathbb{R})$. > > Now $\sigma(X_{i})=\sigma(\mathcal{A}_{i})$, so assuming independence between the latter (as in the theorem) is sufficient for independence between the former. Furthermore, functions $Y_{i}=f_{i}(X_{i})$ of independent RVs $X_{i}$ are also independent (assuming $f_{i}$ are all measurable): $\sigma(Y_{i})\subseteq \sigma(X_{i}),$so independence between the latter implies that of the former. ### Tail Events Suppose we want to study the long-term behavior of a sequence of variables $(X_{n})_{n \ge 1}$. A $\sigma$-algebra containing information about such behavior should allow measurability of the tail, while the "head" (i.e. first finitely many variables) do not matter: > [!definition|*] Tail Algebra > > Define$\mathcal{T}_{n}:=\sigma(X_{n+1},X_{n+2},\dots).$ > The **tail σ-algebra** of a sequence $(X_{n})$ is then $\mathcal{T}=\cap_{n}\mathcal{T}_{n}.$Events in this algebra are called **tail events**. > [!examples] Examples (and counterexamples) of tail events > As a rule of thumb, tail events should not care about what happens in the first $k$ terms, where $k <\infty$ can be however large. Suppose $(X_{n})$ is a sequence of random variables, then > - The event $\left[(X_{n}) \text{ converges} \right]$ is a tail event. > - So is the event $\left[ \sum_{n} X_{n} \text{ converges} \right]$. > > But if the event "remembers" what happens in the head, it is not a tail event: if $S_{n}:=\sum_{i=1}^{n}X_{n}$, > - $[\limsup_{n}S_{n}>0]$ is not a tail event, since the value $X_{1}$ will always be "remembered" in $S_{n}$. > - However, $\left[ \frac{S_{n}}{n} \to 0\right]$ is a tail event, as the impact of earlier terms become negligible when $n \to \infty$. > [!theorem|*] Kolomogorov's 0-1 Rule > > If $(X_{n})_{n \ge 1}$ are independent, and they have tail $\sigma$-algebra $\mathcal{T}$, then: > - Any event $E \in \mathcal{T}$ has $\mathbb{P}[E]\in \{ 0,1 \}$. > - Any $\mathcal{T}$-measurable function/RV $f$ is $\mathrm{a.s.}$ constant. > > Hence if we can show $E \in \mathcal{T}$ has $\mathbb{P}[E] >0$, then $E$ must happen $\mathrm{a.s.}$ > > [!proof]- > > The key to the proof is to show that $\mathcal{T}$ is independent of itself: then any $E \in \mathcal{T}$ has $\mathbb{P}[E]=\mathbb{P}[E \cap E]=\mathbb{P}[E]^{2}$, giving the result. > > > > Define the "head" $\sigma$-algebra $\mathcal{H}_{k}:=\sigma(X_{1},\dots,X_{k})$. > > Then $\forall k \ge 0,~\mathrm{Indep}\{ \mathcal{T}_{k}, \mathcal{H}_{k} \}$ since they are generated by disjoint (hence independent) sets of $(X_{n})$. > > Now $\mathcal{T} \subseteq \mathcal{T}_{k}$, so $\begin{align*} \forall k \ge 0,~&\mathrm{Indep}\{ \mathcal{T}, \mathcal{H}_{k} \}\\[0.2em] \Longrightarrow ~ ~&\mathrm{Indep}\left\{ \mathcal{T}, \bigcup_{k}\mathcal{H_{k}} \right\} \end{align*}$But $\cup_{k} \mathcal{H}_{k}$ is a $\pi$-system, and $\mathcal{T} \subset \sigma(X_{1},X_{2},\dots) = \sigma(\cup_{k}\mathcal{H}_{k})$, so $\mathcal{T}$ is independent of $\sigma(\cup_{k}\mathcal{H}_{k})$, and hence itself. >> --- > > For any function $f$ that is $\mathcal{T}$-measurable, consider its distribution $\mathbb{P}[f \le z]$. > > Since $\{ f \le z \} \in \mathcal{T}$ by assumption, it happens with probability $0$ or $1$. Therefore consider $f^{\ast}:= \inf \{ z ~|~ \mathbb{P}[f \le z]=1 \},$and it must be the $\mathrm{a.s.}$ value of $f$ because $\begin{cases} y < z &\Rightarrow &\mathbb{P}[f \le y] = 0; \\[0.4em] y > z &\Rightarrow &\mathbb{P}[f \ge y] \le \mathbb{P}[f \gneq z] = 0. \end{cases}$ - Therefore, $\lim_{n \to \infty}S_{n} / n$ from the examples is $\mathcal{T}$-measurable, so it must be $\mathrm{a.s.}$ constant. ### Borel-Cantelli Lemmas ^64b057 > [!tldr] > Borel-Cantelli lemmas decide whether events will keep occurring, or will die out eventually. > [!definition|*] Infinitely Often, Eventually > Let $(A_{n})_{n \ge 1}$ be a sequence of events. > $\begin{align*} [A_{n}~\mathrm{i.o.}] &:= \overset{(\text{for all }n,\text{ there is }m)}{\bigcap_{n=1}^{\infty}\bigcup_{m \ge n} A_{m}} =: \limsup_{n} A_{n} \\[0.2em] [A_{n} ~\mathrm{eventually}] &:= \underset{(\text{for some }n,\text{ all }m \ge n)}{\bigcup_{n=1}^{\infty}\bigcap_{m \ge n} A_{m}} =: \liminf_{n} A_{n}. \end{align*}$ > Note that $[A_{n}\,\mathrm{i.o.}]=[A_{n}^{c}\,\text{eventually}]^{c}$. > [!theorem|*] First Borel-Cantelli Lemma > If $(A_{n})_{n \ge 1}$ have $\sum_{n}\mathbb{P}[A_{n}] < \infty,$ then they almost never occur infinitely often. That is, $\mathbb{P}[A_{n}~\mathrm{i.o.}]=0.$This makes no assumption about the independence of $A_{n}$. > > [!proof]- > > Define $f_{n}:= \mathbf{1}_{A_{n}}$, then by the MCT, $\mathbb{E}\left[ \sum_{n}f_{n} \right]=\sum_{n}\mathbb{E}[f_{n}]=\sum_{n}\mathbb{P}[A_{n}] < \infty.$ > > This forces the function/variable $\sum_{n}f_{n}$ to be $\mathrm{a.s.}$ finite, i.e. $A_{n} ~\mathrm{i.o.}$ is almost impossible. > [!theorem|*] Second Borel-Cantelli Lemma > If $(A_{n})_{n \ge 1}$ are independent and have $\sum_{n}\mathbb{P}[A_{n}] =\infty$, then they will occur infinitely often almost surely. That is, $\mathbb{P}[A_{n}\,\mathrm{i.o.}]=1.$ > > [!warning]- Independence is necessary > > Since for example if $A_{n}$ are all the same event $[\text{it rains today}]$, (assuming it occurs with $p \in (0,1)$), then $\sum_{n}\mathbb{P}[A_{n}]=\sum_{n}p \to \infty$, but obviously $\mathbb{P}[A_{n}\,\mathrm{i.o.}]=p < 1$. > > > > [!proof]- > > Denote $p_{n}:= \mathbb{P}[A_{n}],$ then $\begin{align*} \mathbb{P}[A_{n}^{c}\, \text{eventually}]&= \mathbb{P}\left[ \bigcup_{n} \bigcap_{m \ge n}A_{m}^{c} \right]\\ &\le \sum_{n}\mathbb{P}\left[ \bigcap_{m \ge n}A_{m}^{c} \right], \end{align*}$But each term in the sum is $\begin{align*} \mathbb{P}\left[ \bigcap_{m \ge n}A_{m}^{c} \right]&= \prod_{m \ge n}(1-p_{m}) &\text{[independence]}\\ &\le \exp\left( -\sum_{m \ge n}p_{m} \right) &\text{[Taylor expansion]}\\ &= 0 & [ \text{since }\sum_{m}p_{m} \to \infty ] \end{align*}$So $[A_{n}^{c}\,\text{eventually}]=[A_{n}\,\mathrm{i.o.}]^{c}$ is almost impossible, and so $[A_{n}\,\mathrm{i.o.}]$ is $\mathrm{a.s.}$ > > > [!examples]- Monke! > > If a monkey types one of the $26$ letters per second, independently and uniformly randomly, then the events $A_{k}:= \{ \text{``BRUH" is typed between times } 4k \text{ and } 4(k+3) \}$ (inclusive index starting at $0$) all happen independently with probability $26^{-4}$, so $\sum_{k}\mathbb{P}[A_{k}] = \infty \Longrightarrow (A_{k}) ~\mathrm{i.o.} \text{ is } \mathrm{a.s.}$ ^db0190