CVLGJun 12, 2024

OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation

arXiv:2407.09498v216 citations
AI Analysis

This addresses domain shift issues in computer vision for applications requiring robust performance across varied data, though it is an incremental improvement over existing test-time adaptation methods.

The paper tackled the problem of Vision Transformers performing poorly on unseen domains by proposing OT-VP, a test-time adaptation method that uses optimal transport to learn visual prompts without modifying pre-trained parameters, achieving state-of-the-art results on datasets like PACS, VLCS, OfficeHome, and ImageNet-C with only four prompt tokens.

Vision Transformers (ViTs) have demonstrated remarkable capabilities in learning representations, but their performance is compromised when applied to unseen domains. Previous methods either engage in prompt learning during the training phase or modify model parameters at test time through entropy minimization. The former often overlooks unlabeled target data, while the latter doesn't fully address domain shifts. In this work, our approach, Optimal Transport-guided Test-Time Visual Prompting (OT-VP), handles these problems by leveraging prompt learning at test time to align the target and source domains without accessing the training process or altering pre-trained model parameters. This method involves learning a universal visual prompt for the target domain by optimizing the Optimal Transport distance.OT-VP, with only four learned prompt tokens, exceeds state-of-the-art performance across three stylistic datasets-PACS, VLCS, OfficeHome, and one corrupted dataset ImageNet-C. Additionally, OT-VP operates efficiently, both in terms of memory and computation, and is adaptable for extension to online settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes