CVAIMar 30

Domain-Invariant Prompt Learning for Vision-Language Models

arXiv:2603.2855520.4h-index: 5
AI Analysis

This addresses domain generalization for vision-language models, but it is incremental as it extends an existing method.

The paper tackled the problem of domain shifts in vision-language models by proposing Domain-invariant Context Optimization (DiCoOp), which uses adversarial training to learn domain-invariant prompts, resulting in consistent improvements over CoOp in domain generalization tasks.

Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization (CoOp), effectively adapts these models for downstream recognition tasks by learning a set of context vectors. However, CoOp lacks explicit mechanisms for handling domain shifts across unseen distributions. To address this, we propose Domain-invariant Context Optimization (DiCoOp), an extension of CoOp optimized for domain generalization. By employing an adversarial training approach, DiCoOp forces the model to learn domain-invariant prompts while preserving discriminative power for classification. Experimental results show that DiCoOp consistently surpasses CoOp in domain generalization tasks across diverse visual domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes