CVMMMar 23, 2025

MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models

arXiv:2503.18160v11 citationsh-index: 7Has CodeICME
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in prompt tuning for vision-language models, offering a practical solution for researchers and practitioners, though it appears incremental as it builds on existing prompt tuning methods.

The paper tackles the problem of increased complexity and training cost in CLIP-based prompt tuning for vision-language models by proposing Model-Agnostic Optimization (MAO), a plug-and-play method that improves performance while maintaining low computational cost, as demonstrated through extensive experiments.

Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play Model-Agnostic Optimization (MAO) for prompt tuning. Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency. The code of MAO is available at: https://github.com/JREion/M.A.O .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes