On the LSH Distortion of Ulam and Cayley Similarities
This work provides the first tight or near-tight bounds on LSH distortion for these permutation similarity measures, which is important for practitioners using LSH for nearest neighbor search on permutations.
The paper studies the LSH distortion of Ulam and Cayley similarities on permutations, showing that Ulam similarity admits a sublinear distortion of O(n/√log n) with a lower bound of Ω(n^{0.12}), while Cayley similarity has a distortion of Θ(n).
Locality-sensitive hashing (LSH) has found widespread use as a fundamental primitive, particularly to accelerate nearest neighbor search. An LSH scheme for a similarity function $S:\mathcal{X} \times \mathcal{X} \to [0,1]$ is a distribution over hash functions on $\mathcal{X}$ with the property that the probability of collision of any two elements $x,y\in \mathcal{X}$ is exactly equal to $S(x,y)$. However, not all similarity functions admit exact LSH schemes. The notion of LSH distortion measures how multiplicatively close a similarity function is to having an LSH scheme. In this work, we study the LSH distortion of the Ulam and Cayley similarities, which are popular similarity measures on permutations of $n$ elements. We show that the Ulam similarity admits a sublinear LSH distortion of $O(n / \sqrt{\log n})$; we also prove a lower bound of $Ω(n^{0.12})$ on the best LSH distortion achievable. On the other hand, we show that the LSH distortion of the Cayley similarity is $Θ(n)$.