CVLGMar 4, 2024

Differentially Private Representation Learning via Image Captioning

arXiv:2403.02506v211 citationsh-index: 27ICML
Originality Highly original
AI Analysis

This work addresses the problem of preserving privacy while maintaining accuracy in machine learning for applications handling sensitive data, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the sub-optimal privacy-accuracy trade-off in differentially private representation learning by training a DP image captioner on a large-scale dataset, achieving 65.8% accuracy on ImageNet-1K under a privacy budget of ε=8, which improves over the previous SOTA of 56.5%.

Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65.8\%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56.5\%$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes