Curse of Dimensionality - Random Notes Go Brrrrrrr

> [!tldr] > **Curse of Dimensionality** refers to the degradation of model performance when a large number of predictors are present, especially when many of those are noises. ### Case of $k$-Nearest Neighbors Vanilla KNN is especially vulnerable to curse of dimensionality: when the number of predictors $p$ increases, the "neighbors" grow farer away, and increasing sample size is not effective even for moderate sizes of $p$. - In particular, this makes the continuity assumption of KNN (i.e. similar inputs produces similar response) useless, as even the closest neighbors differ a lot. If the predictors are uniformly distributed over the $p$-dimensional unit cube, the median distance from the origin to its nearest neighbors grow rapidly: ![[CurseOfDimensionalityKNN.png#invert|center|w60]] Note that even with huge sample sizes, a mere $14$-dimensional predictor space pushes the median distance further than the distance to the boundary of the box (distance of $\frac{1}{2}$).