CLMay 26

Beyond Input Understanding: Diagnosing Multilingual Mathematical Reasoning with Directed Acyclic Trace Graphs

arXiv:2605.2771582.3h-index: 15
Predicted impact top 61% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For multilingual AI reasoning, the paper provides a diagnostic tool and mitigation strategies for language-specific reasoning failures.

The paper shows that language affects mathematical reasoning execution beyond input understanding, and introduces DATG, a framework to diagnose reasoning failures. Experiments across 12 languages reveal reduced anchor coverage and dependency fidelity in non-English reasoning, and two simple retry strategies improve performance.

Large reasoning models (LRMs) achieve strong mathematical reasoning performance in English, but remain much less reliable in many low- and medium-resource languages. This gap is often explained as a failure to understand non-English problem statements. We show that this view is incomplete: even when the problem is given in English, controlling the model's reasoning language can substantially reduce accuracy, suggesting that language also affects reasoning execution itself. To study this effect, we introduce DATG, a Directed Acyclic Trace Graph framework that maps reasoning traces to language-independent mathematical anchors and dependencies. This allows us to align target-language traces with reference DAGs and measure whether they cover required mathematical nodes, respect dependency edges, and avoid harmful mathematical actions. Experiments on the Qwen3 series across 12 languages show that non-English reasoning often suffers from reduced anchor coverage and weaker dependency fidelity, especially in low-resource languages. Motivated by this diagnosis, we propose Loop-Retry and Formula-Retry, two simple test-time controls targeting DATG-exposed failure modes, and show that they consistently improve target-language reasoning performance in low-resource languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes