Triplet-Block Diffusion RWKV
For practitioners of language modeling, this work addresses the inefficiency of sequential decoding in causal Transformers by integrating efficient linear-time inference with parallel diffusion, though the gains are incremental.
The paper proposes B^3D-RWKV, a diffusion RWKV variant that unifies linear-time causal models with bidirectional discrete diffusion via a triplet-block layout, achieving comparable accuracy on an 8-task suite while delivering a 1.6× average decoding speedup over baselines.
Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration remains inherently inconsistent: diffusion requires bidirectional attention, while causal models are unidirectional. To unify these architectures, we propose $B^3D-RWKV$, a diffusion RWKV variant that integrates the model's $O(L)$ inference efficiency with parallel, bidirectional discrete-diffusion through a \emph{triplet-block layout} method. $B^3D-RWKV-7.2B$ reaches comparable accuracy on an 8-task suite versus existing models while significantly outperforming baselines in decoding throughput with an average of $\mathbf{1.6\times}$ speedup.