CVAug 7, 2025

ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

arXiv:2508.05898v1h-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of adapting pretrained vision-language models to new domains efficiently during testing, which is crucial for real-world applications, though it appears incremental as it builds on existing cache-based TTA approaches.

The paper tackles the problem of vision-language models struggling with generalization under distribution shifts by proposing ETTA, an efficient test-time adaptation method that dynamically updates embeddings and adaptively ensembles prompts, achieving state-of-the-art accuracy and computational efficiency on benchmarks.

Pretrained vision-language models (VLMs) like CLIP show strong zero-shot performance but struggle with generalization under distribution shifts. Test-Time Adaptation (TTA) addresses this by adapting VLMs to unlabeled test data in new domains. While some TTA methods rely on prompt-tuning, training-free cache-based approaches are preferred for efficiency. However, current cache-based TTA models store only a limited set of high-confidence samples, restricting the decision boundary to these samples and ignoring the influence of other incoming test data. To address this, we propose Efficient Test-Time Adaptation (ETTA), introducing a Recursive Updating module that integrates all incoming test samples, progressively refining the decision boundary. This strategy mimics an unbounded cache, dynamically updating contextual embeddings for improved accuracy with minimal memory and computational overhead. ETTA also includes an Adaptive Ensemble module to reduce prompt dependency in image-to-text scores by dynamically selecting optimal prompts for each class. Furthermore, ETTA adaptively combines scores from both modules based on confidence levels, leveraging their complementary strengths. Extensive experiments on two benchmarks confirm that ETTA surpasses the state-of-the-art TTA models in computational complexity and accuracy, setting a new standard for effective, efficient test-time adaptation. The code has been released at https://github.com/hamidreza-dastmalchi/ETTA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes