CVJul 30, 2023

Open-Set Domain Adaptation with Visual-Language Foundation Models

arXiv:2307.16204v118 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses the problem of identifying unknown classes in domain adaptation for computer vision, but it is incremental as it builds on existing CLIP methods.

The paper tackled open-set domain adaptation by leveraging CLIP, a visual-language foundation model, and proposed an entropy optimization strategy, achieving state-of-the-art results on various benchmarks.

Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge obtained from a source domain with labeled data to a target domain with unlabeled data. Owing to the lack of labeled data in the target domain and the possible presence of unknown classes, open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase. Although existing ODA approaches aim to solve the distribution shifts between the source and target domains, most methods fine-tuned ImageNet pre-trained models on the source domain with the adaptation on the target domain. Recent visual-language foundation models (VLFM), such as Contrastive Language-Image Pre-Training (CLIP), are robust to many distribution shifts and, therefore, should substantially improve the performance of ODA. In this work, we explore generic ways to adopt CLIP, a popular VLFM, for ODA. We investigate the performance of zero-shot prediction using CLIP, and then propose an entropy optimization strategy to assist the ODA models with the outputs of CLIP. The proposed approach achieves state-of-the-art results on various benchmarks, demonstrating its effectiveness in addressing the ODA problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes