CVNov 1, 2025

CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks

arXiv:2511.00396v2h-index: 20
Originality Highly original
AI Analysis

This work addresses the challenge of task heterogeneity in saliency detection for computer vision researchers, offering a novel unified approach with significant performance gains.

The authors tackled the problem of handling three heterogeneous saliency tasks (SOD, CoSOD, SIS) by proposing a unified framework using Chain-of-Thought reasoning in a Vision-Language Model, achieving state-of-the-art results such as an S-measure of 0.899 on CoCA for CoSOD, which surpasses the prior best by 8.0 percentage points.

We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks, eg, SOD, CoSOD, and SIS, by casting each as a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity. CoT training follows a two-stage paradigm: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). To enhance CoT quality in RL, we propose Confidence-Guided Policy Optimization (CGPO), a lightweight single-sample algorithm that leverages the discrepancy between reward and model confidence as a per-sample advantage signal. This design naturally focuses updates on informative responses while eliminating group sampling, thereby addressing GRPO's key limitations: confidence-agnostic learning, signal dilution, and prohibitive computational overhead. We also introduce an "output-to-reasoning" strategy to construct high-fidelity SFT data that ensures logical consistency with ground-truth masks. Experiments show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks, especially achieving an S-measure of 0.899 on CoCA for CoSOD, surpassing the prior best by 8.0 percentage points, despite using far less training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes