CVJun 10, 2025

ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations

arXiv:2506.08968v1h-index: 3
Originality Incremental advance
AI Analysis

This addresses the problem of open-world object labeling for computer vision systems, though it appears incremental as it builds on existing LLM and CLIP technologies.

The paper tackles the problem of object detection models being limited to predefined categories in open-world scenarios by introducing ADAM, a training-free framework that uses LLMs and CLIP embeddings to autonomously discover and annotate novel objects. Results on COCO and PASCAL datasets show it effectively annotates novel categories without fine-tuning or retraining.

Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes