On Theoretical Limits of Learning with Label Differential Privacy
This work addresses the fundamental performance trade-offs in privacy-preserving machine learning for scenarios with private labels and public features, providing theoretical insights for researchers and practitioners.
The paper investigates the theoretical limits of learning with label differential privacy (DP) for classification and regression, establishing minimax convergence rates and showing that label local DP yields significantly faster risk convergence than full local DP, while label central DP offers only constant-factor improvements over full DP.
Label differential privacy (DP) is designed for learning problems involving private labels and public features. While various methods have been proposed for learning under label DP, the theoretical limits remain largely unexplored. In this paper, we investigate the fundamental limits of learning with label DP in both local and central models for both classification and regression tasks, characterized by minimax convergence rates. We establish lower bounds by converting each task into a multiple hypothesis testing problem and bounding the test error. Additionally, we develop algorithms that yield matching upper bounds. Our results demonstrate that under label local DP (LDP), the risk has a significantly faster convergence rate than that under full LDP, i.e. protecting both features and labels, indicating the advantages of relaxing the DP definition to focus solely on labels. In contrast, under the label central DP (CDP), the risk is only reduced by a constant factor compared to full DP, indicating that the relaxation of CDP only has limited benefits on the performance.