Differentially Private Sketches for Jaccard Similarity Estimation
This work addresses privacy-preserving similarity estimation for data analysis, but it is incremental as it builds on existing MinHash methods.
The paper tackles the problem of estimating Jaccard similarity between user vectors under local differential privacy by extending MinHash with privacy mechanisms, achieving theoretical error bounds and demonstrating utility-privacy trade-offs in experiments.
This paper describes two locally-differential private algorithms for releasing user vectors such that the Jaccard similarity between these vectors can be efficiently estimated. The basic building block is the well known MinHash method. To achieve a privacy-utility trade-off, MinHash is extended in two ways using variants of Generalized Randomized Response and the Laplace Mechanism. A theoretical analysis provides bounds on the absolute error and experiments show the utility-privacy trade-off on synthetic and real-world data. The paper ends with a critical discussion of related work.