CLAIOct 11, 2025

Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models

arXiv:2510.10252v2h-index: 18Has Code
Originality Incremental advance
AI Analysis

This addresses reasoning-induced hallucinations in language models, which is a critical issue for reliable AI applications, though it appears incremental as it builds on prior work in selective prediction and rejection learning.

The paper tackles the problem of reasoning-induced hallucinations in large language models by proposing Audit-of-Understanding, a framework that constrains inference to validated premises, resulting in accuracy improvements of up to +30% on GSM8K, +45% on MultiArith, and +20-28% on SVAMP over baseline methods.

Large language models (LLMs) often generate reasoning traces that appear coherent but rest on unsupported assumptions, leading to hallucinated conclusions. Prior work mainly addresses factual hallucinations or relies on post-hoc verification, leaving reasoning-induced hallucinations largely unaddressed. We propose Audit-of-Understanding (AoU), a framework that constrains inference to validated premises through three phases: (1) decomposing a query into candidate assumptions, (2) auditing their support, and (3) conditioning inference only on the validated subset. Formally, AoU is \emph{posterior-constrained inference}, connecting to selective prediction and rejection learning. Our contributions are threefold: (i) theoretical guarantees under perfect validation, (ii) excess-risk bounds under imperfect audits, and (iii) tractability analysis. Empirically, AoU improves both accuracy and faithfulness on GSM8K, MultiArith, and SVAMP, achieving up to +30% gains on GSM8K, +45% on MultiArith, and consistent +20--28% improvements on SVAMP over Chain-of-Thought, Self-Consistency, and CoT-Decoding. Code is available at https://anonymous.4open.science/r/audit-of-understanding-E28B.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes