LGDSMLJan 16, 2013

The Anchors Hierachy: Using the triangle inequality to survive high dimensional data

arXiv:1301.3877v19 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of scaling learning algorithms to high-dimensional data, which is incremental as it builds on existing metric tree structures like Ball-Trees.

The paper tackles the problem of accelerating statistical learning algorithms in high-dimensional spaces by introducing the anchors hierarchy, a data structure that uses the triangle inequality to localize data and enable cached sufficient statistics, achieving effective acceleration for tasks like kernel regression and clustering even in thousands of dimensions.

This paper is about metric data structures in high-dimensional or non-Euclidean space that permit cached sufficient statistics accelerations of learning algorithms. It has recently been shown that for less than about 10 dimensions, decorating kd-trees with additional "cached sufficient statistics" such as first and second moments and contingency tables can provide satisfying acceleration for a very wide range of statistical learning tasks such as kernel regression, locally weighted regression, k-means clustering, mixture modeling and Bayes Net learning. In this paper, we begin by defining the anchors hierarchy - a fast data structure and algorithm for localizing data based only on a triangle-inequality-obeying distance metric. We show how this, in its own right, gives a fast and effective clustering of data. But more importantly we show how it can produce a well-balanced structure similar to a Ball-Tree (Omohundro, 1991) or a kind of metric tree (Uhlmann, 1991; Ciaccia, Patella, & Zezula, 1997) in a way that is neither "top-down" nor "bottom-up" but instead "middle-out". We then show how this structure, decorated with cached sufficient statistics, allows a wide variety of statistical learning algorithms to be accelerated even in thousands of dimensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes