Method DriftSpeculative decoding

Superseded baseline#101 of 151 most-superseded

Dynamic Depth Decoding

Dynamic Depth Decoding: Faster Speculative Decoding for LLMs

Speculative decoding · first seen Aug 30, 2024

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 0 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Dynamic Depth Decoding as a baseline.

  • However, unlike , none of these techniques focus on the data movement cost due to speculation. They require access to output probability distributions and are incompatible with approaches like n-gram speculation. Also, they rely on aggressive drafting, assuming very low over-speculation penalties (1\u20132\% per unit increase in K), and must draft/verify at least one token to estimate benefits. Moreover, such schemes introduce CPU to GPU communication between drafter iterations on the GPU, to apply policy heuristics. Consequently, stopping criteria are used infrequently--e.g., DDD defers until the $5^{th}$ drafter iteration—making these methods too costly for MoEs.
    Utility-Driven Speculative Decoding for Mixture-of-Experts

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.