DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
This work addresses limitations in cross-domain datasets for vision-language models, enabling more realistic domain shifts, though it is incremental as it builds on existing VLMs and domain generalization concepts.
The paper tackles the problem of unrealistic and poorly defined domains in cross-domain tasks by introducing DomainVerse, a dataset with 0.5 million images from 390 fine-grained realistic domains for Adaptive Domain Generalization, and proposes Domain CLIP and Domain++ CLIP methods that show effectiveness in experiments.
Traditional cross-domain tasks, including domain adaptation and domain generalization, rely heavily on training model by source domain data. With the recent advance of vision-language models (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which drives us to establish a novel dataset DomainVerse for ADG. Benefiting from the introduced hierarchical definition of domain shifts, DomainVerse consists of about 0.5 million images from 390 fine-grained realistic domains. With the help of the constructed DomainVerse and VLMs, we propose two methods called Domain CLIP and Domain++ CLIP for tuning-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.