CVMay 6

Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

arXiv:2605.0453174.6h-index: 8Has Code
AI Analysis

For practitioners deploying VLM-based object detectors, RGSE provides a training-free method to maintain accuracy under distribution shifts, though it is an incremental improvement over existing test-time adaptation techniques.

RGSE addresses test-time distribution shifts in open-vocabulary object detection by refining text embeddings via evolutionary search without backpropagation, achieving state-of-the-art performance across multiple benchmarks with minimal overhead.

Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes