## Introductory example Consider RVs $X,Z \in m \mathcal{F}$ over the probability triple $(\Omega, \mathcal{F}, \mathbb{P})$, and they take discrete values $\{ x_{1},\dots,x_{m} \}$ and $\{ z_{1},\dots,z_{n} \}$ respectively, and each value has a gt;0$ probability of being taken. The classic definitions of conditional probability and expectations give: $\begin{align*} \mathbb{P}[X=x_{i} ~|~ Z=z_{j}]&:= \frac{\mathbb{P}[X=x_{i}, Z=z_{j}]}{\mathbb{P}[Z=z_{j}]}\\ \mathbb{E}[X ~|~ Z=z_{j}] &:= \sum_{i=1}^{m} x_{i}\mathbb{P}[X=x_{i} ~|~ Z=z_{j}] \end{align*}.$ Alternatively, we can view **conditioning** on $Z$ as partitioning the outcomes $\Omega$ into a "$Z$-cells," i.e. subsets over which $Z$ is constant. ![[ConditionalsAsPartition.png#invert|w70|center]] Then we can define a new RV $Y$, which is constant within each $Z$-cell, taking the value $\begin{align*} &Y:\omega \mapsto \mathbb{E}[X ~|~ Z=Z(\omega)]=: y_{i},\\[0.4em] &Y(\{ \omega: Z(\omega) =z\})= \{ \mathbb{E}[X ~|~ Z=z] \}, \end{align*}$i.e. being constant in each $Z$-cell, taking the value of $\mathbb{E}[X ~|~ Z=z]$ in that cell. This RV is $\mathcal{G}$-measurable, where $\mathcal{G}$ is the $\sigma$-algebra containing the $Z$-cells. Moreover, over any $G \in \mathcal{G}$, $Y$ has the same expectation as $X$ -- the *conditional expectation as a random variable*. This kind of partitions is discussed in [[Partitions and Information Chunks]]. ## Formal Definition > [!definition|*] Conditional Expectation > More generally, in a probability space $(\Omega,\mathcal{F},\mathbb{P})$ with sub-$\sigma$-algebra $\mathcal{G} \subseteq \mathcal{F}$, the **conditional expectation** $Y= \mathbb{E}[X ~|~\mathcal{G}]$ of a random variable $X$ is measurable on $\mathcal{G}$ and satisfies the **defining relation** $\forall G \in \mathcal{G}:~\mathbb{E}[X \mathbf{1}_{G}]=\mathbb{E}[Y\mathbf{1}_{G}].$ > > Conditioning on another variable $Z$ is just $\mathbb{E}[X ~|~ Z]:= \mathbb{E}[X ~|~ \sigma(Z)]$. > - Note that $\mathcal{G}:= \sigma(Z)\supseteq\sigma (\{ Z^{-1}(z) ~|~z \in \mathbb{R}\})$ is slightly larger than the $\sigma$-algebra of the $Z$-cells discussed above. > > The **conditional probability** of event $A \in \mathcal{F}$ is then $\mathbb{P}[A ~|~ G]:= \mathbb{E}[\mathbf{1}_{A} ~|~ G]$. - That is, further information about any $G \in \mathcal{G}$ is irrelevant. - Conditional expectations always exist, and are unique up to $\mathrm{a.e.}$ equality. This means that we can talk about $\mathrm{a.e.}$ equal **versions** of the conditional expectation. The definition above agrees with intuitive constructions: - The $Z$-cell construction in the introduction is just the special case when $Z$ is discrete, so $\mathcal{G}=\sigma(Z)$ equals the $\sigma$-algebra containing the $Z$-cells. - For $Z$ taking continuous values, $\{ Z=z \}$ is (in general) a null set, so the defining relation reduces to $0=0$; the meat of the definition is for the non-singleton preimages $Z^{-1}(A), A \in \mathcal{B}(\mathbb{R})$ where $\mathbb{P}[Z \in A]\gneq 0$. - For a collection of events $(B_{n})$ all with gt;0$ probability, the random variable $\mathbb{E}\left[X ~|~ \mathcal{G}\right](\omega)=\sum_{n\geqslant1}{\frac{\mathbb{E}\left[X\mathbf{1}_{B_{n}}\right]}{\mathbb{P}(B_{n})}}\mathbf{1}_{B_{n}}(\omega)$is (a version of) the conditional expectation of $X$ integrable on $\mathcal{G} := \sigma(B_{n},~ n \ge 1)$. ## Properties of Conditional Expectations To confirm that a variable $Y$ is (a version of) the conditional expectation $\mathbb{E}[X ~|~ \mathcal{G}]$, it is sufficient to confirm that: - $X,Y$ are integrable, and $\mathbb{E}[X]=\mathbb{E}[Y]$; - The defining relationship holds on a generating $\pi$-system $\mathcal{H}:\sigma(\mathcal{H})=\mathcal{G}$. > [!proof]- > Imitate the proof of [[Measure Theory#^33cbad|uniqueness of extensions of measures from $\pi$ systems]]: first show that the collection of $G \in \mathcal{G}$ forms a $\lambda$-system $\mathcal{G}^{\ast}$, then $\mathcal{G}=\sigma(\mathcal{H}) \subseteq \mathcal{G}^{\ast}$, giving the double inclusion $\mathcal{G}=\mathcal{G}^{\ast}$. > - *We cannot directly apply the theorem because $X,Y$ can be negative, so $G\mapsto\mathbb{E}[X \mathbf{1}_{G}]$ is not a measure*. > > Check the definition of a $\lambda$-system for $\mathcal{G}^{\ast}$: > - $\Omega \in \mathcal{G}^{\ast}$ because $\mathbb{E}[X]=\mathbb{E}[Y]$; > - If $A,B \in \mathcal{G}^{\ast}, A \subseteq B$, then $B-A \in \mathcal{G}^{\ast}$ by linearity of expectations. > - To show that a rising union $\cup_{n}A_{n} \overset{?}{\in}\mathcal{G}^{\ast}$, where $(A_{n}) \subseteq \mathcal{G}^{\ast}$ and $\forall n:A_{n}\subseteq A_{n+1}$, we can use the DCT to exchange $\mathbb{E}$ and $\cup_{n}$: $\begin{align*} \mathbb{E}[X\mathbf{1}_{\cup_{n}A_{n}}]&= \mathbb{E}\left[\lim_{N\to \infty}X\mathbf{1}_{A_{N}}\right] \overset{\mathrm{DCT}}{=}\lim_{N \to \infty}\mathbb{E}\left[ X\mathbf{1}_{A_{N}} \right]\\[0.8em] &= \lim_{N \to \infty} \mathbb{E}[Y \mathbf{1}_{A_{N}}] \overset{\mathrm{DCT}}{=} \mathbb{E}\left[ \lim_{N \to \infty} Y\mathbf{1}_{A_{N}} \right]= \mathbb{E}[Y\mathbf{1}_{\cup_{n}A_{n}}]. \end{align*}$This is justified by domination from $| X |, | Y |$, which have finite expectation since $X,Y$ are integrable. > [!note] Properties of conditional expectations > For any $X,Z \in \mathcal{L}^{1}(\Omega, \mathcal{F}, \mathbb{P})$ and $\sigma$-algebra $\mathcal{G} \subseteq \mathcal{F}$, the conditional expectations have the expected properties: > > - If $Y$ is any version of $\mathbb{E}[X ~|~ \mathcal{G}]$, then $\mathbb{E}[X]=\mathbb{E}[Y]$. > > - Repeated information is irrelevant: If $X$ is $\mathcal{G}$-measurable, then $\mathbb{E}[X ~|~ \mathcal{G}]=X$ (more precisely, $X$ is a version of $\mathbb{E}[X ~|~ \mathcal{G}]$. > > - **Linearity**: $\mathbb{E}[aX+bZ ~|~ \mathcal{G}]=a\mathbb{E}[X ~|~ \mathcal{G}]+b\mathbb{E}[Z ~|~ \mathcal{G}]$. > - **Role of independence**: if $\mathcal{H}$ is independent of $\sigma(\sigma(X), \mathcal{G})$, then $\mathbb{E}[X ~|~ \sigma(\mathcal{G,H})]=\mathbb{E}[X ~|~ \mathcal{G}].$In particular, if $X,\mathcal{H}$ are independent, $\mathbb{E}[X ~|~ \mathcal{H}]=\mathbb{E}[X].$ > --- > **Taking out what is known**: if $X,Y,XY$ are all integrable, and $Y$ measurable on $\mathcal{G}$, then $\mathbb{E}[XY ~|~ \mathcal{G}]=Y \mathbb{E}[X ~|~ \mathcal{G}].$That is, if $Y$ is known under $\mathcal{G}$, multiplying with $Y$ commutes with conditioning on $\mathcal{G}$. > > --- > > The **tower property**: if $\mathcal{H} \subseteq \mathcal{G}$ are both $\sigma$-algebras, then $\mathbb{E}[X ~|~ \mathcal{G} ~|~ \mathcal{H}]:= \mathbb{E}\Big[\mathbb{E}[X ~|~ \mathcal{G}] ~|~ \mathcal{H}\Big]=\mathbb{E}[X ~|~ \mathcal{H}]~\mathrm{a.s.}$That is, when conditions are nested, only the smallest $\sigma$-algebra (the coarsest source of information) matters. > > [!proof]- Proof (Tower Property) > > It's sufficient to verify the defining relationship: denote $\mathrm{LHS}=: Y$, then for any $H \in \mathcal{H}$, $\begin{align*} > \mathbb{E}[Y \mathbf{1}_{H}]&= \mathbb{E}[\mathbb{E}[X ~|~ \mathcal{G}]\mathbf{1}_{H}] &&\text{by defining relation of LHS on }\mathcal{H}\\[0.2em] > &= \mathbb{E}[X \mathbf{1}_{H}] && \text{since }H \in \mathcal{H} \subseteq \mathcal{G} > \end{align*}$So the defining relationship of $\mathbb{E}[X ~|~ \mathcal{H}]$ is satisfied by $\mathrm{LHS}$. Furthermore, over $\mathcal{G}$, the conditional expectation $Y=\mathbb{E}[X ~|~ \mathcal{G}]$ is the best least squares approximation of $X$ among all $\mathcal{G}$-measurable functions: $Y = \underset{Z \in m \mathcal{G}}{\arg\min}~ \mathbb{E}[(Z-X)^{2}].$ - That is, with the information available in $\mathcal{G}$, $Y$ is the best approximation. - Another application of this interpretation is allowing completeness of $\mathcal{L}^{2}$ spaces to guarantee the existence of such a minimum if $X \in \mathcal{L}^{2}$. > [!theorem|*] Conditional Expectations are Contractions in $\mathcal{L}^{1}$ > > *Taking conditional expectations is a contraction in $\mathcal{L}^{1}$* in the sense that any RV $X:(\Omega,\mathcal{F},\mathbb{P}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ has $\mathbb{E}[| X |] \ge \mathbb{E}[| Y |],$if $Y$ is (a version of) the conditional expectation $\mathbb{E}[X~|~\mathcal{G}]$ for any sub-algebra $\mathcal{G} \subset \mathcal{F}$. > > > [!proof]- > > Consider the set $A:= \mathbf{1}_{Y >0} \in \mathcal{G}$. On $A$, we have $\mathbb{E}[| Y | \mathbf{1}_{A}]=\mathbb{E}[Y\mathbf{1}_{A}]=\mathbb{E}[X\mathbf{1}_{A}]\le \mathbb{E}[| X |\mathbf{1}_{A}].$ > > Similarly on $A^{c}:= \Omega-A$, we have $\mathbb{E}[| Y |\mathbf{1}_{A^{c}}]=\mathbb{E}[-Y\mathbf{1}_{A^{c}}]=\mathbb{E}[-X\mathbf{1}_{A^{c}}]\le\mathbb{E}[| X |\mathbf{1}_{A^{c}}].$Now summing the two inequalities gives the contraction. > ^8d034f ## Convergence Theorems The by applying them to the expectations $\mathbb{E}[X\mathbf{1}_{A}]$ in the defining relation, main convergence theorems also hold for conditional expectations: if $(X_{n})_{n=1}^{\infty}$ are RV on $(\Omega, \mathcal{F},\mathbb{P})$, - Conditional **MCT**: if $X_{n} \ge 0$ and $X_{n} \uparrow X$, then $\mathbb{E}[X_{n} ~|~ \mathcal{G}] \uparrow \mathbb{E}[X ~|~ \mathcal{G}]~\mathrm{a.e.}$. - Conditional **DCT**: if $\forall n, ~|X_{n}| \le Z$ for some $Z$ integrable, and $X_{n} \to X~\mathrm{a.e.}$, then $\mathbb{E}[X_{n} ~|~ \mathcal{G}] \uparrow \mathbb{E}[X ~|~ \mathcal{G}]~\mathrm{a.e.}$. - As a result, the [[The Standard Machine|standard machine]] also works for conditional expectations. **Conditional Jensen’s Inequality**: if RV $X$ takes values on an open interval $I \subseteq \mathbb{R}$, then any convex function $f:I \to \mathbb{R}$ has $\mathbb{E}[f(X) ~|~\mathcal{G}]\ge f(\mathbb{E}[X ~|~ \mathcal{G}]).$