LG CR IT MLJun 13, 2023

Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training

Alyssa Huang, Peihan Liu, Ryumei Nakada, Linjun Zhang, Wanrong Zhang

arXiv:2306.08173v212.39 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses data privacy concerns for users of vision-and-language models, though it is incremental as it adapts an existing method to a specific model.

The paper tackles privacy risks in multimodal AI by developing a differentially private version of CLIP (Dp-CLIP) that retains accuracy comparable to standard CLIP on tasks like image classification and visual question answering.

The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.

View on arXiv PDF Code

Similar