CLAINov 16, 2024

SAM Decoding: Speculative Decoding via Suffix Automaton

arXiv:2411.10666v324 citationsh-index: 20Has CodeACL
Originality Incremental advance
AI Analysis

This work addresses the bottleneck of slow inference in large language models for users in NLP applications, though it is incremental as it builds on existing speculative decoding techniques.

The paper tackles the problem of inefficient retrieval-based speculative decoding for LLM inference by introducing a method using suffix automaton for draft generation, achieving an average time complexity of O(1) per step and speeding up inference by 18%+ compared to other retrieval-based methods, with additional gains when combined with EAGLE-2.

Speculative decoding (SD) has been demonstrated as an effective technique for lossless LLM inference acceleration. Retrieval-based SD methods, one kind of model-free method, have yielded promising speedup, but they often rely on incomplete retrieval resources, inefficient retrieval methods, and are constrained to certain domains. This paper presents a novel retrieval-based speculative decoding method that adapts suffix automaton (SAM) for efficient and accurate draft generation by utilizing common text corpus and dynamic text sequence. Unlike existing $n$-gram matching methods, SAM-Decoding finds the exact longest suffix match, achieving an average time complexity of O(1) per generation step of SAM update and suffix retrieval. It can also integrate with existing methods, adaptively selecting a draft generation strategy based on match length to generalize to broader domains. Extensive experiments on Spec-Bench show that our method is $18\%+$ faster than other retrieval-based SD methods. Additionally, when combined with advanced EAGLE-2, it provides an additional speedup of $3.28\%$ -- $11.13\%$ across various-sized LLM backbones. Our code is available at our \href{https://github.com/hyx1999/SAM-Decoding}{repository}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes