Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models
This work addresses robustness challenges in vision-language models for tasks like low-shot learning and out-of-distribution generalization, representing an incremental improvement over existing regularization methods.
The paper tackles the problem of improving robustness and adaptability in vision-language models by proposing Dropout Prompt Learning, which applies dropout to tokens with flexible probabilities based on token significance and uses residual entropy regularization, achieving performance gains such as surpassing KgCoOp by 5.10% and PromptSRC by 2.13% on base-to-novel generalization.
Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the robustness of the vision-language models. Different from the vanilla dropout, we apply dropout on the tokens of the textual and visual branches, where we evaluate the token significance considering both intra-modal context and inter-modal alignment, enabling flexible dropout probabilities for each token. Moreover, to maintain semantic alignment for general knowledge transfer while encouraging the diverse representations that dropout introduces, we further propose residual entropy regularization. Experiments on 15 benchmarks show our method's effectiveness in challenging scenarios like low-shot learning, long-tail classification, and out-of-distribution generalization. Notably, our method surpasses regularization-based methods including KgCoOp by 5.10% and PromptSRC by 2.13% in performance on base-to-novel generalization.