CVAIMASep 19, 2025

Agentic Reasoning for Robust Vision Systems via Increased Test-Time Compute

arXiv:2509.16343v1h-index: 14
Originality Incremental advance
AI Analysis

This addresses robustness issues in vision systems for domains like remote sensing and medical diagnosis, but it is incremental as it builds on existing vision-language models and vision systems.

The paper tackles the problem of achieving broad robustness in vision systems for high-stakes domains without retraining, by proposing a training-free agentic reasoning framework that achieves up to 40% absolute accuracy gains on challenging benchmarks.

Developing trustworthy intelligent vision systems for high-stakes domains, \emph{e.g.}, remote sensing and medical diagnosis, demands broad robustness without costly retraining. We propose \textbf{Visual Reasoning Agent (VRA)}, a training-free, agentic reasoning framework that wraps off-the-shelf vision-language models \emph{and} pure vision systems in a \emph{Think--Critique--Act} loop. While VRA incurs significant additional test-time computation, it achieves up to 40\% absolute accuracy gains on challenging visual reasoning benchmarks. Future work will optimize query routing and early stopping to reduce inference overhead while preserving reliability in vision tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes