Dual-Priv Pruning : Efficient Differential Private Fine-Tuning in Multimodal Large Language Models
This work addresses privacy protection for MLLMs, which is critical for applications handling sensitive multimodal data, but it is incremental as it builds on existing DP techniques with specific optimizations.
The paper tackles the challenge of applying differential privacy (DP) to multimodal large language models (MLLMs), which suffers from high computational overhead and model degradation due to noise scaling with parameters, by proposing Dual-Priv Pruning, a framework that uses visual token and gradient-update pruning to reduce input dimensionality and mitigate noise impact, achieving competitive results with minimal performance degradation and improved memory efficiency, such as using only 1.74% more memory than zeroth-order methods on A100 GPUs.
Differential Privacy (DP) is a widely adopted technique, valued for its effectiveness in protecting the privacy of task-specific datasets, making it a critical tool for large language models. However, its effectiveness in Multimodal Large Language Models (MLLMs) remains uncertain. Applying Differential Privacy (DP) inherently introduces substantial computation overhead, a concern particularly relevant for MLLMs which process extensive textual and visual data. Furthermore, a critical challenge of DP is that the injected noise, necessary for privacy, scales with parameter dimensionality, leading to pronounced model degradation; This trade-off between privacy and utility complicates the application of Differential Privacy (DP) to complex architectures like MLLMs. To address these, we propose Dual-Priv Pruning, a framework that employs two complementary pruning mechanisms for DP fine-tuning in MLLMs: (i) visual token pruning to reduce input dimensionality by removing redundant visual information, and (ii) gradient-update pruning during the DP optimization process. This second mechanism selectively prunes parameter updates based on the magnitude of noisy gradients, aiming to mitigate noise impact and improve utility. Experiments demonstrate that our approach achieves competitive results with minimal performance degradation. In terms of computational efficiency, our approach consistently utilizes less memory than standard DP-SGD. While requiring only 1.74% more memory than zeroth-order methods which suffer from severe performance issues on A100 GPUs, our method demonstrates leading memory efficiency on H20 GPUs. To the best of our knowledge, we are the first to explore DP fine-tuning in MLLMs. Our code is coming soon.