Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)
This addresses the problem of unreliable claims in DPML research for practitioners and researchers, though it is incremental as it focuses on evaluation rather than proposing new methods.
The paper tackled the lack of consensus and reproducibility in differentially private machine learning (DPML) by conducting a reproducibility and replicability experiment on 11 state-of-the-art techniques, finding that some methods failed outside their initial conditions while others held up.
There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques. These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation. However, there is no consensus on which techniques are most effective or if they genuinely meet their stated claims. Complicating matters, heterogeneity in codebases, datasets, methodologies, and model architectures make direct comparisons of different approaches challenging. In this paper, we conduct a reproducibility and replicability (R+R) experiment on 11 different SoTA DPML techniques from the recent research literature. Results of our investigation are varied: while some methods stand up to scrutiny, others falter when tested outside their initial experimental conditions. We also discuss challenges unique to the reproducibility of DPML, including additional randomness due to DP noise, and how to address them. Finally, we derive insights and best practices to obtain scientifically valid and reliable results.