CVOct 20, 2024

BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping

arXiv:2410.15430v218 citationsh-index: 14NIPS
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and effective test-time adaptation for vision-language models, which is incremental as it builds on existing methods to reduce computational overhead and improve information mining.

The paper tackles the problem of adapting pretrained vision-language models to downstream tasks without target domain knowledge by proposing BoostAdapter, which bridges training-required and training-free test-time adaptation methods using a light-weight memory for feature retrieval from historical and boosting samples, achieving strong generalization on out-of-distribution and cross-domain datasets.

Adaptation of pretrained vision-language models such as CLIP to various downstream tasks have raised great interest in recent researches. Previous works have proposed a variety of test-time adaptation (TTA) methods to achieve strong generalization without any knowledge of the target domain. However, existing training-required TTA approaches like TPT necessitate entropy minimization that involves large computational overhead, while training-free methods like TDA overlook the potential for information mining from the test samples themselves. In this paper, we break down the design of existing popular training-required and training-free TTA methods and bridge the gap between them within our framework. Specifically, we maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. The historical samples are filtered from the testing data stream and serve to extract useful information from the target distribution, while the boosting samples are drawn from regional bootstrapping and capture the knowledge of the test sample itself. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets, showcasing its applicability in real-world situations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes