PILLAR: How to make semi-private learning more effective
This work addresses the challenge of efficient and private learning for scenarios with limited labelled data, though it appears incremental as it builds on existing semi-private learning frameworks.
The paper tackles the problem of semi-supervised semi-private learning by proposing an algorithm that reduces private labelled sample complexity and improves performance under tight privacy constraints, achieving significant gains over baselines in low-data regimes.
In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networks pre-trained on public (labelled or unlabelled) data, whose distribution can significantly differ from the one on which SP learning is performed. To validate its empirical effectiveness, we propose a wide variety of experiments under tight privacy constraints ($ε= 0.1$) and with a focus on low-data regimes. In all of these settings, our algorithm exhibits significantly improved performance over available baselines that use similar amounts of public data.