CVAIAug 21, 2025

First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection

arXiv:2508.15313v2h-index: 1Has Code
Originality Highly original
AI Analysis

This addresses the problem of high computational cost and manual prompt generation in camouflaged object detection for computer vision applications, offering a practical and efficient solution.

The paper tackles camouflaged object detection by proposing RAG-SEG, a training-free paradigm that uses retrieval-augmented generation and SAM-based segmentation, achieving competitive performance on benchmark datasets while running efficiently on a personal laptop.

Camouflaged object detection (COD) poses a significant challenge in computer vision due to the high similarity between objects and their backgrounds. Existing approaches often rely on heavy training and large computational resources. While foundation models such as the Segment Anything Model (SAM) offer strong generalization, they still struggle to handle COD tasks without fine-tuning and require high-quality prompts to yield good performance. However, generating such prompts manually is costly and inefficient. To address these challenges, we propose \textbf{First RAG, Second SEG (RAG-SEG)}, a training-free paradigm that decouples COD into two stages: Retrieval-Augmented Generation (RAG) for generating coarse masks as prompts, followed by SAM-based segmentation (SEG) for refinement. RAG-SEG constructs a compact retrieval database via unsupervised clustering, enabling fast and effective feature retrieval. During inference, the retrieved features produce pseudo-labels that guide precise mask generation using SAM2. Our method eliminates the need for conventional training while maintaining competitive performance. Extensive experiments on benchmark COD datasets demonstrate that RAG-SEG performs on par with or surpasses state-of-the-art methods. Notably, all experiments are conducted on a \textbf{personal laptop}, highlighting the computational efficiency and practicality of our approach. We present further analysis in the Appendix, covering limitations, salient object detection extension, and possible improvements. \textcolor{blue} {Code: https://github.com/Lwt-diamond/RAG-SEG.}

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes