Quantile Line Plot - Random Notes Go Brrrrrrr

> [!tldr] Quantile Line Plot > Plots the average response $Y$ for each quantile of $X$, i.e. $\bar{y}_{k}=\mathrm{avg}\{ y_{i} ~|~(k-1)\text{th quantile}<x_{i} < k\text{th quantile} \},$for $k=1,\dots$. > > It provides an option to do bias-variance tradeoff over seaborn's `lineplot` -- high $q$ increases the number of bins, reducing bias and increasing variance. ```python def plot_qcut_relationship(feature, response, data = None, q = 10, ax = None, **kwargs): if not ax: ax = plt.subplot() if isinstance(feature, str): feature = data[feature] if isinstance(response, str): response = data[response] binned_feature = pd.qcut(feature, q = q, duplicates = 'drop') midpoint = lambda interval: (interval.left + interval.right) / 2 binned_feature = binned_feature.apply(midpoint) sns.lineplot(x = binned_feature, y = response, ax = ax, **kwargs) ``` ### Example (continuous response) Binary response works the same -- as long as it is encoded to be convertible to integers/floats. ```python from sklearn.datasets import load_diabetes ax = plt.subplot() X, y = load_diabetes(return_X_y=True, as_frame=True) sns.lineplot(x = X.age, y = y, ax = ax, label = "seaborn.lineplot", color = '#ddbbbb') plot_qcut_relationship(X.age, y, label = "qcut(q = 10)", ax = ax) ``` ![[qcutPlotExample.svg#invert|center]]