IRJul 26, 2017

High-Dimensional Simplexes for Supermetric Search

arXiv:1707.08370v119 citations
Originality Incremental advance
AI Analysis

This provides an incremental improvement for similarity search in domains like information retrieval and data mining by enhancing performance in supermetric spaces.

The paper tackles the problem of similarity search in supermetric spaces by constructing high-dimensional simplexes to compute tight distance bounds, resulting in significant reductions in data size and metric cost with little accuracy loss.

In 1953, Blumenthal showed that every semi-metric space that is isometrically embeddable in a Hilbert space has the n-point property; we have previously called such spaces supermetric spaces. Although this is a strictly stronger property than triangle inequality, it is nonetheless closely related and many useful metric spaces possess it. These include Euclidean, Cosine and Jensen-Shannon spaces of any dimension. A simple corollary of the n-point property is that, for any (n+1) objects sampled from the space, there exists an n-dimensional simplex in Euclidean space whose edge lengths correspond to the distances among the objects. We show how the construction of such simplexes in higher dimensions can be used to give arbitrarily tight lower and upper bounds on distances within the original space. This allows the construction of an n-dimensional Euclidean space, from which lower and upper bounds of the original space can be calculated, and which is itself an indexable space with the n-point property. For similarity search, the engineering tradeoffs are good: we show significant reductions in data size and metric cost with little loss of accuracy, leading to a significant overall improvement in search performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes