Maximum Margin Multiclass Nearest Neighbors
This work addresses the problem of scaling multiclass classification algorithms for practitioners by significantly reducing the impact of the number of classes on risk bounds, though it is incremental in improving existing theoretical frameworks.
The paper tackles multiclass classification in metric spaces by developing a margin-regularized nearest-neighbor classifier, achieving generalization bounds with logarithmic dependence on the number of classes k, which is exponentially sharper than previous bounds of order sqrt(k).
We develop a general framework for margin-based multicategory classification in metric spaces. The basic work-horse is a margin-regularized version of the nearest-neighbor classifier. We prove generalization bounds that match the state of the art in sample size $n$ and significantly improve the dependence on the number of classes $k$. Our point of departure is a nearly Bayes-optimal finite-sample risk bound independent of $k$. Although $k$-free, this bound is unregularized and non-adaptive, which motivates our main result: Rademacher and scale-sensitive margin bounds with a logarithmic dependence on $k$. As the best previous risk estimates in this setting were of order $\sqrt k$, our bound is exponentially sharper. From the algorithmic standpoint, in doubling metric spaces our classifier may be trained on $n$ examples in $O(n^2\log n)$ time and evaluated on new points in $O(\log n)$ time.