AICLOct 27, 2025

Latent Chain-of-Thought for Visual Reasoning

arXiv:2510.23925v223 citationsh-index: 14
AI Analysis

This addresses the challenge of unreliable reasoning in vision-language models for AI interpretability, though it appears incremental as it builds on existing CoT methods.

The paper tackled the problem of improving chain-of-thought reasoning in large vision-language models by reformulating it as posterior inference and proposing a scalable training algorithm based on amortized variational inference, resulting in enhanced state-of-the-art performance on seven reasoning benchmarks.

Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen reasoning tasks and heavily rely on a biased reward model. To address this challenge, we reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference. By leveraging diversity-seeking reinforcement learning algorithms, we introduce a novel sparse reward function for token-level learning signals that encourage diverse, high-likelihood latent CoT, overcoming deterministic sampling limitations and avoiding reward hacking. Additionally, we implement a Bayesian inference-scaling strategy that replaces costly Best-of-N and Beam Search with a marginal likelihood to efficiently rank optimal rationales and answers. We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks, in terms of effectiveness, generalization, and interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes