An upper bound on prototype set size for condensed nearest neighbor
This provides a theoretical guarantee for a heuristic method in machine learning, which is incremental as it builds on existing algorithms without introducing a new paradigm.
The paper tackles the problem of bounding the number of prototypical points stored by the condensed nearest neighbor algorithm, deriving an upper bound that is independent of training set size based on a connection to the multiclass perceptron algorithm.
The condensed nearest neighbor (CNN) algorithm is a heuristic for reducing the number of prototypical points stored by a nearest neighbor classifier, while keeping the classification rule given by the reduced prototypical set consistent with the full set. I present an upper bound on the number of prototypical points accumulated by CNN. The bound originates in a bound on the number of times the decision rule is updated during training in the multiclass perceptron algorithm, and thus is independent of training set size.