LGDSIRAug 5, 2013

Sign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels

arXiv:1308.1009v12 citations
Originality Incremental advance
AI Analysis

This work addresses the need for scalable similarity measures in large-scale learning applications like text and vision, but it is incremental as it builds on existing stable random projection methods.

The paper tackles the problem of efficiently computing Lp distances in high-dimensional data using stable random projections, proposing to use only the signs of projected data and deriving bounds for collision probability, with exact results for p=2 and an approximation error smaller than 0.0192 for binary data when p=1.

The method of stable random projections is popular for efficiently computing the Lp distances in high dimension (where 0<p<=2), using small space. Because it adopts nonadaptive linear projections, this method is naturally suitable when the data are collected in a dynamic streaming fashion (i.e., turnstile data streams). In this paper, we propose to use only the signs of the projected data and analyze the probability of collision (i.e., when the two signs differ). We derive a bound of the collision probability which is exact when p=2 and becomes less sharp when p moves away from 2. Interestingly, when p=1 (i.e., Cauchy random projections), we show that the probability of collision can be accurately approximated as functions of the chi-square similarity. For example, when the (un-normalized) data are binary, the maximum approximation error of the collision probability is smaller than 0.0192. In text and vision applications, the chi-square similarity is a popular measure for nonnegative data when the features are generated from histograms. Our experiments confirm that the proposed method is promising for large-scale learning applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes