```dataview
table without id
File as "Topics",
join(
sort(map(
filter(file.tags, (tag) => any(map(this.domain_tags, (domtag) => contains(tag, domtag + "/")))),
(x) => regexreplace(replace(x, "_", " "),
"#("+ join(this.domain_tags, "|") +")/", "")
)
),
", ") as "Type",
dateformat(file.mtime, "yyyy-MM-dd") as "Last Modified"
from ""
FLATTEN "[[" + file.path + "|" + truncate(file.name, 30) + "]]" as File
where
(
(domain and contains(domain, this.file.link)
and (file.name != this.file.name))
or
any(map(file.tags,
(x) => econtains(this.domain_tags, substring(x, 1))))
or
any(map(file.tags,
(x) => any(map(this.domain_tags,
(domtag) => contains(x, domtag + "/")))
))
)
and
!contains(file.path, "2 - Snippets")
and
!contains(file.tags, "subdomain")
sort file.mtime desc
```
```dataview
table without id
File as "Snippets",
join(
sort(map(
filter(file.tags, (tag) => any(map(this.domain_tags, (domtag) => contains(tag, domtag + "/")))),
(x) => regexreplace(replace(x, "_", " "),
"#("+ join(this.domain_tags, "|") +")/", "")
)
),
", ") as "Type",
dateformat(file.mtime, "yyyy-MM-dd") as "Last Modified"
from "2 - Snippets"
FLATTEN "[[" + file.path + "|" + truncate(file.name, 30) + "]]" as File
where
(
(domain and contains(domain, this.file.link)
and (file.name != this.file.name))
or
any(map(file.tags,
(x) => econtains(this.domain_tags, substring(x, 1))))
or
any(map(file.tags,
(x) => any(map(this.domain_tags,
(domtag) => contains(x, domtag + "/")))
))
)
sort file.mtime desc
```
> [!theorem|*] Bayes' Theorem
> Recall that **Bayes' theorem** for discrete random variables $X,Y$ states that $\mathbb{P}(X|Y)=\frac{\mathbb{P}(Y|X)\mathbb{P}(X)}{\mathbb{P}(Y)}$and for continuous random variables $X,Y$ with pdf. $f_{X}$ and $f_{Y}$, the conditional pdf is $f_{X|Y}(x|y)=\frac{f_{Y|X}(y|x)f_{X}(x)}{f_{Y}(y)}$where $f_{A|B}(a|b)=f_{A,B}(a,b) / f(b)$.
In Bayesian inference, parameter(s) is treated as a variable instead of a fixed value, and they have their own distributions:
> [!definition|*] Priors and Posteriors
> The **prior (distribution)** $\pi(\theta)$ is the distribution of the parameter we assume to be true, before any data is seen.
>
> After collecting data, the updated distribution of $\theta$ is a new distribution called the **posterior (distribution)** $\pi (\theta|\mathbf{x})$.$\text{prior} \xrightarrow[\text{and update}]{\text{observe data}}\text{posterior}$
- We can use priors to express a formerly held or default opinion (e.g. ignorance, if we use [[Uninformative Priors]]).
The "update" on the prior is done via Bayes' theorem: $\pi(\theta|\mathbf{x})=\frac{f(\mathbf{x}|\theta)\pi(\theta)} {f(\mathbf{x})}=\frac{f(\mathbf{x}|\theta)\pi(\theta)}{\int _{\mathbb{R}} f(\mathbf{x}|\theta)\pi(\theta) \, dx },$but in most cases we don't need to compute the denominator, and identify the posterior with the following:
> [!lemma|*] Updating the Bayesian Prior
> Since the denominator $f(\mathbf{x})$ is just a constant in Bayesian context,$\underset{\text{posterior}}{\pi(\theta|\mathbf{x})}
\propto
\underset{\text{likelihood}}{f(\mathbf{x}|\theta)}
\times
\underset{\text{prior}}{\pi(\theta)},$so if we can identify $\mathrm{RHS}$ as some familiar distribution, we can pinpoint the posterior without solving the normalizing denominator.
^8a333e
- We can choose priors that make it easy to identify the RHS, simplifying our calculations (see [[Conjugate Priors]])
## Bayesian Inference
### Credible Intervals
Credible intervals are the Bayesian equivalent of confidence intervals.
> [!definition|*] Credible Intervals
> A $100(1-\alpha)\%$ **credible set** is any set $C \subseteq \Theta$ that the probability of $\theta | \mathbf{x}$ being in it is $100(1-\alpha)\%$: $C: \int _{C} \pi(\theta|\mathbf{x}) \, dx =1-\alpha.$
* The difference between the Bayesian credible interval $C$ and the frequentist confidence interval $I$ is just what is considered variable: $\begin{align*}
&\text{Bayesian:} &&\text{variable $\theta|\mathbf{x}$ has probability ... to be in $C$. }\\
&\text{Frequentist:} &&\text{variable $I(\mathbf{x})$ has probability ... to contain $\theta$.}
\end{align*}$
A **credible interval** $(\theta_{1},\theta_{2})$ is just a credible set that is also an interval. A credible interval is **equal-tailed** if $\mathbb{P}(\theta|\mathbf{x} \le \theta_{1})=\mathbb{P}(\theta|\mathbf{x} \ge \theta_{1})$; for example, the two highlighted tails have equal area:
![[EqualTailsInterval.png#invert]]
A credible set/interval $C$ is **highest posterior density (HPD)** if it has the form $C = \{ \theta:\pi(\theta|\mathbf{x}) \ge p_{\min} \}$for some constant $p_{\min}$. Equivalently, $C:\forall \theta \in C, \theta' \not \in C, \pi(\theta|\mathbf{x})>\pi(\theta'|\mathbf{x})$For example, the highlight region is an HPD set, where $p_{\min}$ is the horizontal line:
![[HPDInterval.png#invert]]
### Prediction with the Posterior
Given observations $\mathbf{X}=(X_{1},\dots,X_{n})$ and the posterior $\theta \sim\pi(\theta|\mathbf{x})$ computed from them, the **(posterior) predictive density** of a new observation $X_{n+1}$ is $f(X_{n+1}|\mathbf{x})$.
* *That is, our updated belief about the underlying distribution $f$, after incorporating the knowledge from previous data $\mathbf{x}$.*
The predictive density can be computed by: $\begin{align*}
f(X_{n+1}|\mathbf{x})&= \int f(X_{n+1},\theta|\mathbf{x}) \, d\theta &&\substack{\text{definition of }\\\text{marginal density}}\\
&= \int f(X_{n+1}|\theta,\mathbf{x}) \,\cdot\,\pi(\theta|\mathbf{x}) \, d\theta &&\text{Bayes' rule}\\
&= \int \underset{\text{prediction}}{f(X_{n+1}|\theta)} \,\cdot\, \underset{\text{posterior}}{\pi(\theta|\mathbf{x})} \, d\theta &&\substack{\text{independence of } X_{n+1} \\\text{ and } \mathbf{x}=(x_{1},\dots,x_{n})}
\end{align*}$i.e. the prediction averaged over the posterior.
## Hierarchical Bayesian Models
With data $\mathbf{Y}=(Y_{1},\dots,Y_{n})\sim(f(\theta_{1}),..,f(\theta_{n}))$, where $\theta_{1},\dots,\theta_{n}$ are parameters, more simplistic approaches of inferring their values include:
- Pooling the data and estimate $\hat{\theta}=\hat{\theta}_{1}=\dots =\hat{\theta}_{n}$ as a frequentist approach, or use a common Bayesian prior $\pi(\theta)$ and posterior $\pi(\theta\,|\,\mathbf{Y})$.
- Model each data point $Y_{i}$ and parameter $\theta_{i} \approx \hat{\theta}_{i}$ completely separately, ignoring the rest of the data set.
Neither of these make much sense, motivating the **hierarchical model**: the parameters $\theta_{1},\dots,\theta_{n}$ are different, but all drawn from the same distribution $\pi(\theta;\phi)$, which is determined by the **hyperparameters** $\phi$, which themselves are treated as variables to be inferred.
> [!definition|*] Hierarchical Bayesian Methods
>
> A hierarchical Bayesian model contains:
> - The hyperparameters $\phi \sim P$, the **hyperprior** with density $p$.
> - The (conditional) prior $\theta \,|\,\phi \sim \pi_{\phi}$ with density $p(\theta \,|\,\phi)$;
> - The likelihood $y \,|\,\theta,\phi \sim P_{\theta}$, with density $p(y \,|\, \theta)$. Note that $P_{\theta}$ is independent of $\phi$.
>
> Independence assumptions include (for $i \ne j$):
> - $\theta_{i,j}$ are not necessarily independent, but $\theta_{i}\,|\,\phi$ and $\theta_{j} \,|\, \phi$ are.
> - $y_{i,j}\,|\,\phi$ are independent.
>
> The **joint prior** is $p(\theta,\phi)=p(\phi)p(\theta \,|\,\phi)$.
> The **joint posterior ** is $p(\theta,\phi \,|\,y) \propto p(y\,|\,\theta,\phi)p(\theta,\phi)=p(y \,|\, \theta)p(\theta , \phi)$.
http://www2.stat.duke.edu/~pdh10/Teaching/581/LectureNotes/bayes.pdf