> [!warning]- Regularity Conditions
>Most stuff in this note assume a few regularity conditions on the family of distributions $\{ f(\cdot;\,\theta) \,|\, \theta \in \Theta\}$:
> - $[\mathrm{R{1}}]$ They have a common support.
> - $[\mathrm{R} 2]$ The parameter space $\Theta$ is open.
> - $[\mathrm{R}{3}]$ The derivative $\frac{ \partial f }{ \partial \theta }$ exists and is dominated by some integrable function.
>
> They will be omitted for the rest of this note.
Suppose again we have a sample $\mathbf{X}=(X_{1},\dots,X_{n})$ from a distribution $f(\cdot\,;\theta)$, and we want to estimate $\gamma:= g(\theta)$ with the point estimator $T(\mathbf{X})$.
Obviously with a finite sample, $T(\mathbf{X})$ cannot be arbitrarily precise, so *there is a lower bound on its MSE*.
- This bound serves as a basis point for comparing different estimators -- those that attain the bound have to be the best ones.
> [!idea] Limiting the scope to unbiased estimators
> If we wish to compare the [[Point Estimators#^e44b87|MSE]] of two arbitrary estimators $T_{1},T_{2}$, it seem natural to see if one is **uniformly better** than the other, i.e. if $T_{1}$ is uniformly better than $T_{2}$, then $\forall \theta \in \Theta,~\mathrm{MSE}_{\theta}(T_{1}) \le \mathrm{MSE}_{\theta}(T_{2}).$
> However, nothing is uniformly better than the trivial estimator $T_{0}\equiv \theta_{0} \in \Theta$, as $\mathrm{MSE}_{\theta_{0}}(T_{0})=0$, so this comparison is not useful.
>
> The issue here is that [[Point Estimators#^e44b87|biased]] estimators like $T_{0}$ can be pointlessly pathological, making it impossible to find a uniformly better estimator. Hence we restrict the search to **minimum variance unbiased estimators (MVUEs)**.
## Fisher and Observed Information
Ideally, the more information a sample contains, the smaller the MSE can be made -- this necessitates measures for the amount of information in a sample.
Given a sample $\mathbf{x}$, the **score function** $S(\theta,\mathbf{x}):\Theta \to \mathbb{R}^{k}$ is $S(\theta,\mathbf{x})=\frac{ \partial }{ \partial \theta } l(\mathbf{x};\theta)$where $l$ is the log-likelihood. Hence the MLE $\hat{\theta}$ satisfies $S(\hat{\theta},\mathbf{x})=0$.
- $\mathbb{E}_{\theta}[S(\theta,\mathbf{X})]=0$, which is verified by exchanging the derivatives with the integral.
> [!definition|*] Fisher Information
> The **Fisher information** of a parameter $\theta$ is the expected variance of the score function (averaged over all possible samples $\mathbf{X}$):
$I_{\mathbf{X}}(\theta)= \mathrm{Var}_{\theta}(S(\theta, \mathbf{X}))=\mathbb{E}_{\theta}[S^{2}]$
- A large $\mathrm{Var}(S)$ means that the value of $S$ is sensitive to $\mathbf{X}$, i.e. the likelihood decays sharply around the MLE $\hat{\theta}^\mathrm{MLE}(\mathbf{X})$, and there is a lot of information in $\mathbf{X}$, enough to narrow down on $\theta$.
> [!definition|*] Observed Information
> Assume second-order differentiability of the log-likelihood $l$, the **observed information** is $J(\theta,\mathbf{x})=-l''(\mathbf{x};\theta)$If $\theta=\hat{\theta}^\mathrm{MLE}$, it measures how sharply the decay in likelihood is growing.
>
> If there are multiple sources of information (e.g. two independent samples $X, Y$), then use subscripts to differentiate (e.g. $J_{X},J_{Y}$).
^7df282
- The observed information appears in the second-order approximation of the log-likelihood $l(\theta)$ near $\hat{\theta}$:
$l(\theta) \approx l(\hat{ \theta}) + \cancel{(\theta-\hat{\theta})\left.\frac{\partial l}{\partial \theta} \right|_{\hat{\theta}}}+(\theta-\hat{\theta})^{2} \underbrace{\left.\frac{\partial^{2} l}{\partial \theta^{2}}\right|_{\hat{\theta}}}_{-J(\hat{\theta})}.$
The larger $J(\hat{ \theta})$ is, the faster log-likelihood decays when $\theta$ moves away from $\hat{\theta}$. In other words, *large information = precise MLE*.
> [!info]+ In Higher Dimensions
> - The score function is $S=\nabla_{\theta}\,l=\left( \frac{ \partial l }{ \partial \theta_{1} },\dots,\frac{ \partial l }{ \partial \theta_{k} } \right)$.
> - The Fisher information matrix is $I_{\mathbf{X}}(\theta)=\mathrm{Cov}(S)$.
> - The observed information matrix is $J(\theta,\mathbf{x})$ given by $J_{ij}=-\frac{ \partial^{2} l }{ \partial \theta_{i} \partial \theta_{j}} $
### Properties of the Fisher Information
> [!theorem|*] Fisher and Observed Information
> *Fisher information is the expected observed information*: $I_{\mathbf{X}}(\theta)=\mathbb{E}_{\mathbf{X} ~|~\theta}[J(\theta,\mathbf{X})],$and in higher dimensions, the same hold for each entry of the matrices.
The score function, and by extension the *informations are additive for independent samples*:
- For independent variables $X,Y$ whose respective distributions share the parameter $\theta$, $\begin{align*}
I_{(X,Y)}(\theta)&= I_{X}(\theta)+I_{Y}(\theta),\\
J_{X,Y}(\theta,\mathbf{x}, \mathbf{y})&= J_{X}(\theta,\mathbf{x})+J_{Y}(\theta,\mathbf{y}).
\end{align*}$
- In particular, if $X_{1},\dots,X_{n}\overset{\mathrm{iid.}}{\sim} f(\cdot\,;\theta)$, then the whole sample $\mathbf{X}$ has information $I_{\mathbf{X}}=n \cdot i_{X},$where $i_{X}:= I_{X_{1}}$ is the *Fisher information in one sample*. This usually makes computations easier.
> [!warning]
> *Fisher information changes under reparametrization*: if $\phi \mapsto \theta(\phi)$ is a differentiable reparameterization, then $I^{*}_{\mathbf{X}}(\phi)=I_{\mathbf{X}}(\theta)\cdot \theta'(\phi)^{2}$where $I^{*}$ and $I$ are Fisher information for $\phi,\theta$ respectively, which are in general distinct functions.
> > [!proof]- Sketch proof
> > Use the variance definition of Fisher information, and note that the score function of $\phi$ is $S_{\phi}=\frac{ \partial }{ \partial \phi }l(\mathbf{x};\theta(\phi))=S_{\theta}\theta'(\phi)$. When taking the variance, the scaling term $\theta'(\phi)$ is squared.
### Information and Statistics
Refer to [[Minimality and Sufficiency#Sufficiency|sufficiency]].
> [!theorem|*] Sufficient statistics do not lose information
> Under some regularity conditions, the *sufficient statistic yields the same observed (and hence Fisher) information as the original sample*. That is, if $\mathbf{X} \sim f(\cdot\,;\theta)$ and $T(\mathbf{X})$ is sufficient, then $\begin{align*}
J_{\mathbf{X}}(\theta,\mathbf{x})&= J_{T}(\theta,T(\mathbf{x})),\\
I_{\mathbf{X}}(\theta)&= I_{T(\mathbf{X})}(\theta).
\end{align*}$This relationship is $\ge$ in general for non-sufficient statistics.
>
> > [!proof]-
> > By sufficiency (and the factorization criterion), we may pick $g(T;\theta)$ to be the density of $T$, and $h$ the density of $\mathbf{X} ~|~ T$ such that $f(\mathbf{x};\theta)=g(T;\theta)h(\mathbf{x}).$Then computing the observed information $-\frac{ \partial }{ \partial \theta^{2} }\text{loglik}$ from the original sample $\mathbf{X}=\mathbf{x}$ gives $J_{\mathbf{X}}(\theta,\mathbf{x})\overset{(1)}{=}-\frac{ \partial ^{2} }{ \partial \theta^{2} }\log g(T;\theta)\overset{(2)}{=}J_{T}(\theta,T(\mathbf{x})), $where $(1)$ follows from $h(\mathbf{x})$ being constant here, and $(2)$ is by choice of $g$ and definition of observed information.
> >
> > Since the Fisher information is just the expectation of observed information, they are equal as well.
^3cd3db
This property makes computing information of certain distributions easier: if $S\sim f(\cdot\,; \theta)$ has the same distribution as a sufficient statistic of $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim} g(\cdot\,; \theta)$, then $I_{S}(\theta)=ni_{X}(\theta)$. The latter is usually easier to compute.
> [!examples] Deriving Fisher information of a binomial variable
> Suppose $S \sim \mathrm{Binom}(n, \theta)$ where $n$ is known. Then with some hairy algebra we can find that the Fisher information is $I_{S}(\theta)=\frac{n}{\theta(1-\theta)}$.
>
> But alternatively, recall that $S:=\sum_{i}X_{i}\sim \mathrm{Binom}(n,p)$ is a sufficient statistic of the sample $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim}\mathrm{Bernoulli}(p)$, where each has Fisher information $I_{X}(\theta)=\frac{1}{\theta(1-\theta)}$; we can find this will minimal fuss.
>
> Since information is additive, the whole sample has information $\frac{n}{\theta(1-\theta)}$, and that must equal the information in $S=\sum_{i}X_{i}$.
## Cramer-Rao Lower Bounds
> [!tldr]
> The Fisher information determines the **CRLB**, a lower bound of the variance of unbiased, regular estimators.
A statistic $T$ is **regular** if it allows the exchange of differentiation and integration in $\begin{align*}
\int T(x)\frac{ \partial }{ \partial \theta } L(x;\theta) \, dx &= \frac{ \partial }{ \partial \theta } \int T(x)L(x;\theta) \, dx \\
&= \frac{ \partial }{ \partial \theta } \mathbb{E}_{\theta}[T(X)].
\end{align*}$
> [!theorem|*] Cramer-Lao Lower Bounds
>
> In estimating $\gamma=g(\theta)$, if the information $I_{X}<\infty$, $g'\ne 0$, then for any regular, unbiased estimator $T$ of $\gamma$, the **Cramer-Rao lower bound (CRLB)** holds: $\mathrm{MSE}_{\theta}(T)=\mathrm{Var}_{\theta}(T) \ge \frac{g'(\theta)^{2}}{I_{X}(\theta)}.$The equality is attained if and only if for (almost) all $x,\theta$, $T(x)=g(\theta)+\frac{g'(\theta)S(\theta,x)}{I_{X}(\theta)},$and of course the expression must simplify to be independent of $\theta$.
>
> > [!info]- In higher dimensions
> > Variance of vector-valued variables are compared with the Loewner order: $A \preceq B$ if $B-A$ is positive semi-definite.
> > CRLB is given by $\mathrm{Var}_{\theta}(T) \succeq J_{g}I_{X}^{-1}J_{g}^{T}$where $J_{g}(\theta)_{ij}=\frac{ \partial g_{i} }{ \partial \theta_{j} }$ is the Jacobian.
>
> > [!proof]- Sketch proof for scalar estimators
> > First prove that $\mathrm{Cov}_{\theta}(T, S(\theta,\mathbf{X}))=g'(\theta)$ using regularity. Then consider the random variable $T-c(\theta)S(\theta,\mathbf{X})$ where $c(\theta):= g'(\theta) / I_{X}(\theta)$. Its variance is non-negative and can be shown to equal $\mathrm{Var}_{\theta}(T)-\mathrm{CRLB}$.
> >
> > If the equality holds (so the latter is $0$), then $T-c(\theta)S(\theta,\mathbf{X})$ must be ($\mathrm{a.s.}$) constant, so being unbiased it equals $g(\theta)$, and $T=g(\theta)+c(\theta)S(\theta,\mathbf{X})$.
In particular, if $g$ is the identity function, and we are directly estimating the scalar-valued parameter $\theta$, the CRLB simplifies to $\mathrm{MSE}_{\theta}(\hat{\theta})=\mathrm{Var}_{\theta}(\hat{\theta}) \ge I_{X}(\theta)^{-1}.$Recalling that $\hat{\theta}_{\mathrm{MLE}}$ is asymptotically $N(\theta, I_{X_{1:n}}^{-1}(\theta))$ when the sample size $n \to \infty$, we see that it approaches the CRLB. Moreover, $I_{X_{1:n}} =O(n)$, so the variance decreases linearly with sample size.
If $T^{*}$ is a biased estimator of $\theta$, then it is an unbiased estimator of $\theta+\mathrm{bias}(T^{*};\theta)$, so applying CRLB gives $\mathrm{Var}_{\theta}(T^{*})\ge \frac{[1+\mathrm{bias}(T^{*};\theta)']^{2}}{I_{X}(\theta)}.$
### Attaining the CRLB
> [!definition|*] Efficiency
> If $T$ is an unbiased estimator of $\gamma=g(\theta)$, its **efficiency** is the ratio between the CRLB and its variance: $\mathrm{eff}_{\theta}(T,\gamma)=\frac{\mathrm{CRLB}(T)}{\mathrm{Var}_{\theta}(T)}=\frac{g'(\theta)^{2}}{I_{X}(\theta)\mathrm{Var}_{\theta}(T)}$
> By definition, $\mathrm{eff}_{\theta} \in [0,1]$ for regular unbiased estimators, and $T$ is called **efficient** if it attains the CRLB ($\mathrm{eff}_{\theta}=1~\forall \theta$).
However, regular, unbiased estimators do not attain CRLB in general: if $T$ is efficient, then the distribution must be from an [[Exponential Families|exponential family]].
> [!theorem|*] Efficiency in 1-parameter exponential families
>
> If $\mathcal{P}$ is a strictly $1$-parameter family with canonical statistic $T$, then *$T$ is an efficient estimator of $\mathbb{E}_{\theta}[T]$*.
> - This does not necessarily hold in higher dimensions.
>
> > [!proof]-
> > First note that in (1-parameter) exponential families, $\begin{align*}
> S(\eta,T)&= \frac{ \partial }{ \partial \eta }(T\cdot\eta-B(\eta)) =T-B',\\
> J(\eta,T)&= -\frac{ \partial }{ \partial \eta }S(\eta,T)=B ''\\
> I_{\eta}(T)&= \mathbb{E}_{\eta}[J(\eta,T)]=B '' ~(\mathrm{const.})
> \end{align*}$Therefore, for the natural parametrization, the CRLB reduces to $\frac{ \partial }{ \partial \eta }\mathbb{E}_{\eta}[T]$, which equals $B '' =\mathrm{Var}(T)$.
> > ---
> > For non-natural parameterization $\theta \mapsto \eta(\theta)$, $\begin{align*}
> S(\theta,T)&= \frac{ \partial }{ \partial \theta }(T\cdot\eta(\theta)-B(\theta)) =T\eta'-B'\eta',\\
> J(\theta,T)&=- \frac{ \partial }{ \partial \theta }S(\theta,T)=\cancel{-T\eta''} + B ''\cdot(\eta')^{2}+\cancel{B'\eta''},\\
> I_{\theta}(T)&= \mathbb{E}_{\theta}[J(\theta,T)]=B '' \cdot(\eta')^{2}
> \end{align*}$The same result follows: $\mathrm{CRLB}=\frac{\left( \frac{ \partial \mathbb{E}_{\theta}[T] }{ \partial \theta } \right)^{2}}{I_{\theta}(T)}=\frac{\left( \frac{ \partial \mathbb{E}_{\eta(\theta)}[T] }{ \partial \eta } \eta' \right)^{2}}{B '' \cdot (\eta')^{2}}=B '' = \mathrm{Var}(T). $
> [!theorem|*] MLE and the CRLB
> If CRLB is attained by some unbiased predictor, it must be the MLE:
> - More precisely, if $\theta$ has the MLE $\hat{\theta}_{\mathrm{MLE}}$, and there is an unbiased $\tilde{\theta}$ that attains CRLB, then $\hat{\theta}_{\mathrm{MLE}}=\tilde{\theta}\,\,\mathrm{a.s.}$.
>
> > [!proof]-
> > Suppose $\tilde{\theta}$ is efficient and unbiased, then $\tilde{\theta}-\theta=S(\theta,x) / I_{X}(\theta)~\mathrm{a.s.}$
> > Then plugging in $\theta=\hat{\theta}_{\mathrm{MLE}}$ gives $\tilde{\theta}=\hat{\theta}_{\mathrm{MLE}}~\mathrm{a.s.}$ since $S(\hat{\theta}_{\mathrm{MLE}})=0$.
## Minimum Variance Unbiased Estimators
> [!tldr]
> - **Rao-Blackwell** gives a way to improve unbiased estimators using sufficient statistics.
> - For sufficient statistics that are complete, **Lehmann-Scheffé** guarantees that the improvement is optimal.
In general, the CRLB is not achievable. However, it's still valuable to find the **minimum variance unbiased estimator (MVUE)**.
> [!definition|*] MVUE
> An unbiased estimator $T$ of $g(\theta)$ is the **MVUE** if it is uniformly better than any other unbiased estimator $T'$. That is, $\forall \theta \in \Theta,~\underbrace{\mathrm{Var}_{\theta}(T)}_{=\mathrm{MSE}_{\theta}(T)} \le \underbrace{\mathrm{Var}_{\theta}(T')}_{=\mathrm{MSE}_{\theta}(T')}.$Here MSE and variance are equal since they are unbiased.
> [!theorem|*] Rao-Blackwell
>
> Suppose $X \sim f(\cdot\,; \theta)$, $T(X)$ is sufficient, and $\hat{\gamma}$ is an unbiased estimator of $\gamma=g(\theta)$.
>
> Then *incorporating the information of $T$ gives a better unbiased estimator* $\hat{\gamma}_{T}= \mathbb{E}_{\theta}[\hat{\gamma}\,|\,T]$:
> - $\hat{\gamma}_{T}$ is independent of $\theta$, so it can be used as an estimator.
> - $\mathbb{E}_{\theta}[\hat{\gamma}_{T}]=\gamma$, so $\hat{\gamma}_{T}$ is unbiased.
> - $\mathrm{Var}(\hat{\gamma}_{T}) \le \mathrm{Var}(\hat{\gamma})$, so $\hat{\gamma}_{T}$ is a better estimator.
> - Equality holds if and only if $\hat{\gamma}_{T}=\hat{\gamma}\,\,\mathrm{a.s.}$
>
> > [!info]- In higher dimensions
> > If $\gamma \in \mathbb{R}^{k}$ is vector-valued, then the variance inequality becomes $\mathrm{Cov}(\hat{\gamma}_{T}) \preceq \mathrm{Cov}(\hat{\gamma})$.
>
> > [!proof]-
> > *Independence*: conditioning on $T$ removes dependence on $\theta$: $\mathbb{E}_{\theta}[\hat{\gamma}(X)\,|\,T]=\int _{\mathcal{X}}\hat{\gamma}(x) \underbrace{f(x\,|\,\theta,T)}_{\text{indep. of }\theta} \, dx $
> > *Unbiasedness*: $\mathbb{E}[\hat{\gamma}_{T}]=\mathbb{E}_{T}[\mathbb{E}_{X}[\hat{\gamma}(X)\,|\,T]]=\mathbb{E}[\hat{\gamma}]=\gamma$where subscripts after $\mathbb{E}$ indicate the variable over which the expectation is taken.
> >
> > *Less variance*: since both $\hat{\gamma},\hat{\gamma}_{T}$ are unbiased, it's enough to show that $\mathbb{E}[\hat{\gamma}^{2}] \ge \mathbb{E}[\hat{\gamma}_{T}^{2}]$: $\mathbb{E}_{}[\hat{\gamma}_{T}^{2}]=\mathbb{E}_{T}[\mathbb{E}_{X}[\hat{\gamma} \,|\, T]^{2}] \le \mathbb{E}_{T}[\mathbb{E}_{X}[\hat{\gamma}^{2} \,|\, T]]=\mathbb{E}[\hat{\gamma}^{2}]$where the inequality is just $\mathbb{E}[W]^{2} \le \mathbb{E}[W]^{2}$ with $W=\hat{\gamma}$ and conditioned on $T$.
> >
> > For that to be an equality, $\mathrm{Var}(W)$ must be $0\,\,\mathrm{a.s.}$, i.e. given $T$, $\hat{\gamma}=\mathrm{const.}=\hat{\gamma}_{T}\mathrm{\,\,a.s.}$.
One consequence of Rao-Blackwell is that an unbiased estimator $\hat{\gamma}$ can always be improved, unless it is (solely) a function of some sufficient statistic $T$.
### MVUE from Complete Estimators
> [!definition|*] Completeness
> A family of distributions $\mathcal{P}=\{ P_{\theta}\,|\, \theta \in \Theta \}, X \sim P_{\theta}$ is **complete** if there is no non-trivial unbiased estimators of $0$: $\forall h(X)$, $\big( \mathbb{E}_{\theta}[h(X)]=0\,\, \forall \theta \in \Theta \big) \Longrightarrow \big(h(X)=0 \,\,\mathrm{a.s.}\,\, \forall \theta \in \Theta\big).$Equivalently, the unbiased estimators of the same quantity $\gamma$ are $\mathrm{a.e.}$ equal.
>
> A statistic $T(X)$ is **complete** if the family of distributions of $T$ $\mathcal{P}_{T}=\{ P^{(T)}_{\theta}\,|\, \theta \in \Theta \}, ~T \sim P^{(T)}_{\theta}$ is complete, i.e. $\big( \mathbb{E}_{\theta}[h(T)]=0\,\, \forall \theta \in \Theta \big) \Longrightarrow \big(h(T)=0 \,\,\mathrm{a.s.}\,\, \forall \theta \in \Theta\big).$
MVUEs from complete, sufficient statistics exist in general, without needing to attain the CRLB, which is only possible for exponential families.
> [!math|{"type":"theorem","number":"","setAsNoteMathLink":false,"title":"Lehmann-Scheffé","label":"lehmann-scheff"}] Theorem (Lehmann-Scheffé).
> If $\hat{\gamma}$ is an unbiased estimator of $\gamma$, $T$ is a complete and sufficient statistic, then $\hat{\gamma}_{T}:= \mathbb{E}[\hat{\gamma}\,|\, T]$ is a MVUE of $\gamma$.
> > [!proof]-
> > For any other unbiased estimator $\tilde{\gamma}$, consider $\tilde{\gamma}_{T}=\mathbb{E}[\tilde{\gamma}\,|\,T]$. Both $\hat{\gamma}_{T},\tilde{\gamma}_{T}$ are functions of $T$ by sufficiency, and they are both unbiased estimators of $\gamma$.
> >
> > By completeness of $T$, they are $\mathrm{a.e.}$ equal, so $\mathrm{Var}(\hat{\gamma}_{T}) = \mathrm{Var}(\tilde{\gamma}_{T})\le \mathrm{Var}(\tilde{\gamma})$for any unbiased estimator $\tilde{\gamma}$.
One important corollary is that *if an unbiased estimator $\hat{\gamma}$ is a function of a complete statistic, it must be a MVUE*.
- For example, in a sample $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim}N(\mu,\sigma^{2})$, $\mathbf{T}=\left( \sum_{i}X_{i},\sum_{i}X_{i}^{2}\right)$ is complete (see next section), so $(\bar{X}, S^{2})$ are MVUEs of $(\mu,\sigma^{2})$.
- For $X_{1},\dots,X_{n} \sim U[0,\theta]$, $X_{\mathrm{max}}$ is complete, so the estimator $\hat{\theta}= \frac{n+1}{n}X_{\mathrm{max}}$ is the MVUE of $\theta$; it does not achieve the CRLB because the distribution is not regular enough (its support depends on $\theta$).
### Finding Complete Statistics
> [!theorem|*] Complete Statistics of Exponential Families
> If $\mathcal{P}$ is a full-rank strictly $k$-parameter exponential family, its canonical observations $T$ are complete, sufficient statistics.
A general sufficient and necessary condition of an MVUE is independence with all unbiased estimators of $0$ with finite variance:
$\begin{gather}
\mathcal{U}:= \{ u:\mathcal{X} \to \mathbb{R} \,|\, \mathrm{Var}(u)<\infty, \mathbb{E}[u]=0\}\\[0.4em] \hat{\gamma}\text{ is MVUE } \iff \forall u \in \mathcal{U}: \mathbb{E}[\hat{\gamma}u]=0
\end{gather}$
> [!proof]
> $(\Rightarrow)$ follows from $\begin{align*}
0 &\ge \mathrm{Var}(\hat{\gamma}) -\mathrm{Var}(\hat{\gamma}+cu)\\
&= c\mathbb{E}[\hat{\gamma}u]-c^{2}\mathrm{\mathrm{Var}(u)}
\end{align*}$ for any $c$, taking the quadratic's determinant shows that it is only possible if $\mathbb{E}[\hat{\gamma}u]=0$.
>
> $(\Leftarrow)$: Take any other unbiased estimator $\tilde{\gamma}$ and use $u=\hat{\gamma}-\tilde{\gamma}$: $0 = \mathbb{E}[\hat{\gamma}(\hat{\gamma}-\tilde{\gamma})]= \underbrace{\mathbb{E}[\hat{\gamma}^{2}]-\gamma^{2}}_{\mathrm{Var}(\hat{\gamma})}-\mathrm{Cov}(\hat{\gamma},\tilde{\gamma})$Hence $\mathrm{Var}(\hat{\gamma})^{2}=\mathrm{Cov}(\hat{\gamma},\tilde{\gamma})^{2}$, and Cauchy-Schwarz gives $\mathrm{Var}(\hat{\gamma})^{2}=\mathrm{Cov}(\hat{\gamma},\tilde{\gamma})^{2} \le \mathrm{Var}(\hat{\gamma})\mathrm{Var}(\tilde{\gamma}).$Dividing both sides by $\mathrm{Var}(\hat{\gamma})$ proves $\hat{\gamma}$ is MVUE.
> [!warning] Complete, sufficient statistics do not always exist
> If $X_{1},\dots,X_{n} \overset{\mathrm{iid.}}{\sim} \mathrm{Unif}[\theta, \theta+1]$, then they does not have complete, sufficient statistics.
>
> Furthermore, the distribution does not even have an MVUE of $\theta$.