Differentially Private Top-k Selection via Canonical Lipschitz Mechanism
This work addresses a fundamental task in differential privacy for applications like data analysis, offering a more efficient and utility-enhanced method for top-k selection.
The paper tackles the problem of selecting the top-k items under differential privacy by introducing the canonical Lipschitz mechanism, which unifies existing methods and directly selects subsets in O(dk + d log d) time. The result shows an Ω(log k) factor improvement in utility guarantees over sequential composition, with experiments confirming substantial utility gains.
Selecting the top-$k$ highest scoring items under differential privacy (DP) is a fundamental task with many applications. This work presents three new results. First, the exponential mechanism, permute-and-flip and report-noisy-max, as well as their oneshot variants, are unified into the Lipschitz mechanism, an additive noise mechanism with a single DP-proof via a mandated Lipschitz property for the noise distribution. Second, this new generalized mechanism is paired with a canonical loss function to obtain the canonical Lipschitz mechanism, which can directly select k-subsets out of $d$ items in $O(dk+d \log d)$ time. The canonical loss function assesses subsets by how many users must change for the subset to become top-$k$. Third, this composition-free approach to subset selection improves utility guarantees by an $Ω(\log k)$ factor compared to one-by-one selection via sequential composition, and our experiments on synthetic and real-world data indicate substantial utility improvements.