Besides [[Linear Regression in Causal Inference]], we can also use **subclassification** and **matching** for estimating the effect.
### Subclassification
If the covariate $X$ is categorical with $K$ levels, we can partition the data by $X$ and $T$, then average the response $Y$ to estimate the effect in each category: $\begin{align}
\text{in category }k:~ & d_{k}:=\bar{y}_{1k} - \bar{y}_{0k} \\
\text{average over catrgories}:~ & \sum_{k=1}^{K}d_{k} \cdot \frac{n_{k}}{n},
\end{align}$where $n_{k}$ is the number of observations with $X=k$, and $n$ the total sample size. This average is the **subclassification estimator**, which controls for $X$.
### Matching
In a similar spirit, we can compare treated observation $y_{i}$ with an untreated observation $y_{i'}$ that is similar in all covariates (not just one categorical variable), and assuming $Y_{0,1}$ changes continuously in the inputs $X$, we can *assume that $y_{i}-y_{i'}$ will be a good estimator for $\mathbb{E}[Y_{i 1}-Y_{i0}]$*.
Now this process of finding a "similar" observation is called **matching**, and can be done with standard techniques like [[Prototyping Methods and Nearest Neighbors#Nearest Neighbors Classifiers|nearest neighbors]], and are implemented in python packages.
- Of course, standard requirements of KNN apply: e.g. the covariates should be comparable in scale, or the distance metric should account for this.
Now do the same for untreated observations (since nearest neighbors are not necessarily reflective) and take the average to estimate the treatment effect.
If $\mathrm{N}(i)$ finds the required neighbor observation with a different treatment level to the $i$th observation, we have $\begin{align}
\text{over treated}: ~&s_{1}:=\sum_{i: t_{i}=1}(y_{i}-y_{N(i)}) \\
\text{over untreated}: ~& s_{0}:=\sum_{i: t_{i} = 0}(y_{N(i)}-y_{i}) \\
\text{take average}: ~&\hat{\mathbb{E}}[Y_{1}-Y_{0}]=(s_{1}+s_{0}) / n
\end{align}$