CVSep 17, 2015

Accelerated Distance Computation with Encoding Tree for High Dimensional Data

arXiv:1509.05186v2
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks in large-scale high-dimensional data analysis, offering an incremental improvement compatible with existing quantization-based methods.

The paper tackles the problem of excessive computation in distance calculations for high-dimensional data by proposing Encoding Tree (E-Tree) and Encoding Forest (E-Forest) methods that store vector quantization encodings to speed up processing. Experimental results show these methods drastically accelerate distance computing while also reducing memory consumption.

We propose a novel distance to calculate distance between high dimensional vector pairs, utilizing vector quantization generated encodings. Vector quantization based methods are successful in handling large scale high dimensional data. These methods compress vectors into short encodings, and allow efficient distance computation between an uncompressed vector and compressed dataset without decompressing explicitly. However for large datasets, these distance computing methods perform excessive computations. We avoid excessive computations by storing the encodings on an Encoding Tree(E-Tree), interestingly the memory consumption is also lowered. We also propose Encoding Forest(E-Forest) to further lower the computation cost. E-Tree and E-Forest is compatible with various existing quantization-based methods. We show by experiments our methods speed-up distance computing for high dimensional data drastically, and various existing algorithms can benefit from our methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes