Near-optimal algorithms for private estimation and sequential testing of collision probability
This work addresses the need for efficient and private statistical analysis of discrete distributions, which is incremental but offers specific gains in sample complexity for applications in fields like data science and privacy-preserving analytics.
The paper tackles the problem of estimating and testing collision probability under local differential privacy constraints, presenting algorithms that achieve near-optimal sample complexity with improvements such as a factor of 1/α² over prior work and requiring significantly fewer samples in experiments.
We present new algorithms for estimating and testing \emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(α, β)$-local differential privacy and estimates collision probability with error at most $ε$ using $\tilde{O}\left(\frac{\log(1/β)}{α^2 ε^2}\right)$ samples for $α\le 1$, which improves over previous work by a factor of $\frac{1}{α^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $ε$ using $\tilde{O}(\frac{1}{ε^2})$ samples, even when $ε$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.