CLMay 29

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

arXiv:2605.3085295.4h-index: 12Has Code
Predicted impact top 11% in CL · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of accelerating low-concurrency LLM inference for researchers and practitioners working with large language models, offering a potentially more scalable solution than existing methods.

This paper introduces Speculative Pipeline Decoding (SPD), a new framework that uses pipeline parallelism to accelerate LLM inference. SPD partitions the LLM into 'n' pipeline stages, allowing 'n' tokens to be processed in parallel, and a speculation module predicts the next token using intermediate features, resulting in higher acceptance rates and zero latency bubbles.

Speculative Decoding (SD) accelerates low-concurrency LLM inference by employing a draft-then-verify paradigm. However, mainstream methods typically rely on multi-token prediction, which introduces escalating prediction difficulty and serial drafting latency. To address these, we propose Speculative Pipeline Decoding (SPD), a groundbreaking framework that unlocks the true potential of pipeline parallelism. By partitioning the target LLM into $n$ pipeline stages, SPD allows LLM to process $n$ tokens in parallel to accelerate decoding. To continuous fill the pipeline in single sequence decoding, a speculation module aggregates intermediate features across different pipeline depths to predict the next token, executing strictly in parallel with the target model's pipeline step, to realize bounded difficulty, higher acceptance rates, and zero latency bubbles. Our experiments demonstrate that SPD achieves a significantly higher theoretical speedup compared to mainstream baselines, offering a highly scalable solution for LLM decoding acceleration. Our code is available at https://github.com/yuyijiong/speculative_pipeline_decoding

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes