Statistical Indistinguishability of Learning Algorithms
This work addresses the need for robust statistical indistinguishability in machine learning, offering foundational insights that could impact privacy and reproducibility, though it is incremental in building on existing stability concepts.
The paper tackles the problem of measuring similarity between outcomes of learning algorithms on different datasets by introducing TV indistinguishability, and establishes equivalences with existing stability notions while providing amplification and boosting algorithms.
When two different parties use the same learning rule on their own data, how can we test whether the distributions of the two outcomes are similar? In this paper, we study the similarity of outcomes of learning rules through the lens of the Total Variation (TV) distance of distributions. We say that a learning rule is TV indistinguishable if the expected TV distance between the posterior distributions of its outputs, executed on two training data sets drawn independently from the same distribution, is small. We first investigate the learnability of hypothesis classes using TV indistinguishable learners. Our main results are information-theoretic equivalences between TV indistinguishability and existing algorithmic stability notions such as replicability and approximate differential privacy. Then, we provide statistical amplification and boosting algorithms for TV indistinguishable learners.