AISep 25, 2025

Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning

arXiv:2509.20744v1h-index: 3
Originality Highly original
AI Analysis

This addresses efficiency and quality issues in reasoning-intensive domains like mathematics and code, though it is an incremental advancement.

The paper tackles the problem of slow inference in auto-regressive models for reasoning tasks by integrating them with non-autoregressive models, resulting in a 26% improvement over baselines and reduced inference cost.

We study reasoning tasks through a framework that integrates auto-regressive (AR) and non-autoregressive (NAR) language models. AR models, which generate text sequentially, excel at producing coherent outputs but often suffer from slow inference, particularly in reasoning-intensive domains such as mathematics and code, where lengthy chains of thought are required. In contrast, NAR models, such as discrete diffusion models, allow parallel generation and offer substantial speedups, though typically at the cost of reduced output quality. To address these limitations, we introduce a new paradigm in which an NAR model efficiently produces intermediate reasoning traces, which subsequently guide an AR model to deliver precise final answers. Experiments demonstrate that our approach yields significant 26% improvements over strong baselines while substantially reducing inference cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes