CVLGJan 30

Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models

arXiv:2601.23253v1h-index: 23
Originality Incremental advance
AI Analysis

This addresses the problem of real-world applicability for vision-language models by providing an efficient adaptation method, though it is incremental as it builds on existing test-time adaptation approaches.

The paper tackles performance degradation in vision-language models under domain shift by proposing TaTa, a training-free test-time adaptation method using Brownian Distance Covariance, which significantly reduces computational cost and achieves state-of-the-art performance in domain and cross-dataset generalization.

Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computationally intensive, rely on back-propagation, and often focus on single modalities. To address these issues, we propose Training-free Test-Time Adaptation with Brownian Distance Covariance (TaTa). TaTa leverages Brownian Distance Covariance-a powerful statistical measure that captures both linear and nonlinear dependencies via pairwise distances-to dynamically adapt VLMs to new domains without training or back-propagation. This not only improves efficiency but also enhances stability by avoiding disruptive weight updates. TaTa further integrates attribute-enhanced prompting to improve vision-language inference with descriptive visual cues. Combined with dynamic clustering and pseudo-label refinement, it effectively recalibrates the model for novel visual contexts. Experiments across diverse datasets show that TaTa significantly reduces computational cost while achieving state-of-the-art performance in domain and cross-dataset generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes