Fast and Multiphase Rates for Nearest Neighbor Classifiers
This work addresses the problem of understanding non-uniform error scaling in machine learning for researchers, providing theoretical insights into nearest neighbor classifiers, but it is incremental as it builds on classical results.
The paper investigates the scaling of classification error rates with training dataset size, revealing that error rates can vary across different sample size ranges, and demonstrates that nearest neighbor classifiers exhibit fast polynomial error decay in early phases and slow exponential decay in later phases, with error rates depending polynomially on dimension for benign data distributions.
We study the scaling of classification error rates with respect to the size of the training dataset. In contrast to classical results where rates are minimax optimal for a problem class, this work starts with the empirical observation that, even for a fixed data distribution, the error scaling can have \emph{diverse} rates across different ranges of sample size. To understand when and why the error rate is non-uniform, we theoretically analyze nearest neighbor classifiers. We show that an error scaling law can have fine-grained rates: in the early phase, the test error depends polynomially on the data dimension and decreases fast; whereas in the later phase, the error depends exponentially on the data dimension and decreases slowly. Our analysis highlights the complexity of the data distribution in determining the test error. When the data are distributed benignly, we show that the generalization error of nearest neighbor classifier can depend polynomially, instead of exponentially, on the data dimension.