CVMay 15

REC-RL: Referring expression counting via Gaussian and range-based reward optimization

arXiv:2605.1646053.3
AI Analysis

For researchers in vision-language reasoning, this work improves REC by explicitly optimizing the reasoning process without extra annotations, though it is an incremental improvement over existing methods.

The paper tackles referring expression counting (REC) by proposing a reinforcement learning framework (REC-RL) that optimizes intermediate reasoning via Gaussian and range-based rewards, achieving consistent improvements over strong baselines and robust generalization across benchmarks.

Referring expression counting (REC) is an intention-driven task that requires context-aware visual reasoning. While recent vision-language models incorporate language for visual understanding, most existing REC methods rely on rulebased reinforcement learning with rewards focused primarily on final accuracy, overlooking the quality of intermediate reasoning. We propose REC-RL, a reinforcement learning framework that introduces a think-range-answer paradigm to explicitly optimize the visual reasoning process. RECRL employs Group Relative Policy Optimization and two lightweight rewards: an accuracy reward that combines range-based interval supervision with Gaussian-based precision guidance, and a format reward that enforces structured outputs. By modeling intermediate focus prediction as internal decision-making, REC-RL avoids additional annotations and better aligns with human perception. Extensive experiments demonstrate consistent improvements over strong baselines and robust generalization across benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes