CVJun 1, 2024

Efficient Open Set Single Image Test Time Adaptation of Vision Language Models

arXiv:2406.00481v21 citations
AI Analysis

This work addresses a critical challenge in deep learning for real-world deployment, enabling models to continuously adapt to single test images in open-set scenarios, which is incremental but practical.

The paper tackles the problem of adapting vision-language models to dynamic, real-world environments with shifting data distributions and unseen test scenarios, proposing ROSITA, a framework that achieves state-of-the-art performance in open-set test-time adaptation while maintaining computational efficiency for real-time deployment.

Adapting models to dynamic, real-world environments characterized by shifting data distributions and unseen test scenarios is a critical challenge in deep learning. In this paper, we consider a realistic and challenging Test-Time Adaptation setting, where a model must continuously adapt to test samples that arrive sequentially, one at a time, while distinguishing between known and unknown classes. Current Test-Time Adaptation methods operate under closed-set assumptions or batch processing, differing from the real-world open-set scenarios. We address this limitation by establishing a comprehensive benchmark for {\em Open-set Single-image Test-Time Adaptation using Vision-Language Models}. Furthermore, we propose ROSITA, a novel framework that leverages dynamically updated feature banks to identify reliable test samples and employs a contrastive learning objective to improve the separation between known and unknown classes. Our approach effectively adapts models to domain shifts for known classes while rejecting unfamiliar samples. Extensive experiments across diverse real-world benchmarks demonstrate that ROSITA sets a new state-of-the-art in open-set TTA, achieving both strong performance and computational efficiency for real-time deployment. Our code can be found at the project site https://manogna-s.github.io/rosita/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes