CVFeb 6

LAB-Det: Language as a Domain-Invariant Bridge for Training-Free One-Shot Domain Generalization in Object Detection

arXiv:2602.06474v1h-index: 4
Originality Highly original
AI Analysis

This addresses the challenge of adapting object detectors to specialized domains with minimal data, offering an efficient alternative to fine-tuning for applications like underwater or industrial defect detection.

The paper tackles the problem of object detectors degrading in specialized, data-scarce domains like underwater imagery or industrial defects by proposing a training-free one-shot domain generalization method that uses language as a domain-invariant bridge, achieving up to 5.4 mAP improvement over fine-tuned baselines without updating parameters.

Foundation object detectors such as GLIP and Grounding DINO excel on general-domain data but often degrade in specialized and data-scarce settings like underwater imagery or industrial defects. Typical cross-domain few-shot approaches rely on fine-tuning scarce target data, incurring cost and overfitting risks. We instead ask: Can a frozen detector adapt with only one exemplar per class without training? To answer this, we introduce training-free one-shot domain generalization for object detection, where detectors must adapt to specialized domains with only one annotated exemplar per class and no weight updates. To tackle this task, we propose LAB-Det, which exploits Language As a domain-invariant Bridge. Instead of adapting visual features, we project each exemplar into a descriptive text that conditions and guides a frozen detector. This linguistic conditioning replaces gradient-based adaptation, enabling robust generalization in data-scarce domains. We evaluate on UODD (underwater) and NEU-DET (industrial defects), two widely adopted benchmarks for data-scarce detection, where object boundaries are often ambiguous, and LAB-Det achieves up to 5.4 mAP improvement over state-of-the-art fine-tuned baselines without updating a single parameter. These results establish linguistic adaptation as an efficient and interpretable alternative to fine-tuning in specialized detection settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes