Minimax Optimal Algorithms with Fixed-$k$-Nearest Neighbors
This work addresses the challenge of efficient and optimal learning in distributed settings for large-scale data analysis, representing an incremental improvement by adapting fixed-k NN methods to distributed scenarios.
The paper tackles the problem of achieving minimax optimal rates in classification, regression, and density estimation using fixed-k nearest neighbor searches in a distributed learning scenario, showing that their distributed algorithm with fixed k over many groups attains minimax optimal error rates up to a logarithmic factor, comparable to standard Θ(kM)-NN rules.
This paper presents how to perform minimax optimal classification, regression, and density estimation based on fixed-$k$ nearest neighbor (NN) searches. We consider a distributed learning scenario, in which a massive dataset is split into smaller groups, where the $k$-NNs are found for a query point with respect to each subset of data. We propose \emph{optimal} rules to aggregate the fixed-$k$-NN information for classification, regression, and density estimation that achieve minimax optimal rates for the respective problems. We show that the distributed algorithm with a fixed $k$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions. Roughly speaking, distributed $k$-NN rules with $M$ groups has a performance comparable to the standard $Θ(kM)$-NN rules even for fixed $k$.