CV AI CL MMMar 11, 2025

Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation

Xiwen Chen, Wenhui Zhu, Peijie Qiu, Hao Wang, Huayu Li, Haiyu Wu, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

arXiv:2503.08906v210.23 citationsh-index: 10Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of preserving pre-trained knowledge in vision-language model adaptation, which is crucial for maintaining zero-shot generalization in applications like image-text tasks, though it is incremental as it builds on prompt learning with a novel regularization approach.

The paper tackles the problem of knowledge forgetting and overfitting when adapting vision-language models to downstream tasks, proposing an optimal transport-guided prompt learning framework that improves base-to-novel generalization, cross-dataset evaluation, and domain generalization, outperforming existing methods without extra techniques.

Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. Prompt learning has emerged as an efficient and effective strategy to adapt VLMs while preserving their pre-trained knowledge. However, existing methods still lead to overfitting and degrade zero-shot generalization. To address this challenge, we propose an optimal transport (OT)-guided prompt learning framework that mitigates forgetting by preserving the structural consistency of feature distributions between pre-trained and fine-tuned models. Unlike conventional point-wise constraints, OT naturally captures cross-instance relationships and expands the feasible parameter space for prompt tuning, allowing a better trade-off between adaptation and generalization. Our approach enforces joint constraints on both vision and text representations, ensuring a holistic feature alignment. Extensive experiments on benchmark datasets demonstrate that our simple yet effective method can outperform existing prompt learning strategies in base-to-novel generalization, cross-dataset evaluation, and domain generalization without additional augmentation or ensemble techniques. The code is available at https://github.com/ChongQingNoSubway/Prompt-OT

View on arXiv PDF Code

Similar