Edgar Chavez

LGFeb 4, 2021

Instance-based learning using the Half-Space Proximal Graph

Ariana Talamantes, Edgar Chavez

The primary example of instance-based learning is the $k$-nearest neighbor rule (kNN), praised for its simplicity and the capacity to adapt to new unseen data and toss away old data. The main disadvantages often mentioned are the classification complexity, which is $O(n)$, and the estimation of the parameter $k$, the number of nearest neighbors to be used. The use of indexes at classification time lifts the former disadvantage, while there is no conclusive method for the latter. This paper presents a parameter-free instance-based learning algorithm using the {\em Half-Space Proximal} (HSP) graph. The HSP neighbors simultaneously possess proximity and variety concerning the center node. To classify a given query, we compute its HSP neighbors and apply a simple majority rule over them. In our experiments, the resulting classifier bettered $KNN$ for any $k$ in a battery of datasets. This improvement sticks even when applying weighted majority rules to both kNN and HSP classifiers. Surprisingly, when using a probabilistic index to approximate the HSP graph and consequently speeding-up the classification task, our method could {\em improve} its accuracy in stark contrast with the kNN classifier, which worsens with a probabilistic index.

DSMay 29, 2017

A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs

Eric S. Tellez, Guillermo Ruiz, Edgar Chavez et al.

Near neighbor search (NNS) is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query $q$ by minimizing a kernel function defined by the distance from $q$ to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph's construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks.

Edgar Chavez

2 Papers