AICLDec 10, 2025

Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning

arXiv:2512.10054v1
Originality Incremental advance
AI Analysis

This addresses the problem of slow sequential generation in LLMs for users needing faster inference, though it is incremental as it builds on existing parallelization methods.

The paper tackles the latency bottleneck in autoregressive decoding of Large Language Models by introducing the Parallel Decoder Transformer (PDT), which embeds coordination primitives into a frozen pre-trained model to enable parallel decoding streams, achieving 77.8% precision in coverage prediction.

Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While ``Decomposition-and-Fill'' methods like Skeleton-of-Thought attempt to parallelize generation via external orchestration, they suffer from \textit{coherence drift} due to the lack of cross-stream communication. In this work, we introduce the \textbf{Parallel Decoder Transformer (PDT)}, a parameter-efficient architecture that embeds coordination primitives directly into the inference process of a frozen pre-trained model. Instead of retraining the base model, PDT injects lightweight \textit{Speculative Note Conditioning (SNC)} adapters that allow parallel decoding streams to synchronize via a shared, dynamic latent space. We formulate coordination as a \textit{speculative consensus} problem, where sibling streams broadcast semantic ``notes'' to a global bus, gated by a learned verification head. We validate our approach on a 50,000-step curriculum using a frozen 20B-parameter backbone. Our results demonstrate that PDT achieves effective self-correction, reaching \textbf{77.8\% precision} in coverage prediction and recovering approximate serial semantics without modifying the trunk weights. This establishes PDT as a scalable, efficient alternative to full model fine-tuning for structured parallel generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes