MEIRLGApr 26, 2018

Sign-Full Random Projections

arXiv:1805.00533v120 citations
Originality Incremental advance
AI Analysis

This work provides a more efficient method for similarity search and machine learning on large datasets, though it is incremental as it builds on existing random projection techniques.

The paper tackles the problem of estimating cosine similarity more accurately than existing 1-bit random projections by introducing 'sign-full' random projections, achieving an asymptotic variance reduction to about 0.4 of the prior method at high similarity.

The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two $D$-dim data vectors $u$, $v\in\mathbb{R}^D$, one can generate $x = \sum_{i=1}^D u_i r_i$, and $y = \sum_{i=1}^D v_i r_i$, where $r_i\sim N(0,1)$ iid. The "collision probability" is ${Pr}\left(sgn(x)=sgn(y)\right) = 1-\frac{\cos^{-1}ρ}π$, where $ρ= ρ(u,v)$ is the cosine similarity. We develop "sign-full" random projections by estimating $ρ$ from (e.g.,) the expectation $E(sgn(x)y)=\sqrt{\frac{2}π} ρ$, which can be further substantially improved by normalizing $y$. For nonnegative data, we recommend an interesting estimator based on $E\left(y_- 1_{x\geq 0} + y_+ 1_{x<0}\right)$ and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity ($ρ\rightarrow1$), the asymptotic variance of recommended estimator is only $\frac{4}{3π} \approx 0.4$ of the estimator for sign-sign projections. At small $k$ and high similarity, the improvement would be even much more substantial.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes