LGCRCVNov 2, 2023

DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

arXiv:2311.01295v116 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in privacy-preserving machine learning for computer vision, offering incremental improvements over existing methods.

The paper tackles the incompatibility of data augmentation with differentially private learning by proposing two novel techniques, DP-Mix_Self and DP-Mix_Diff, which achieve state-of-the-art classification performance across multiple datasets.

Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the learned model is bounded. In this paper, we investigate why naive applications of multi-sample data augmentation techniques, such as mixup, fail to achieve good performance and propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning. Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data. Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process. We open-source the code at https://github.com/wenxuan-Bao/DP-Mix.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes