LGAINov 19, 2025

Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone

arXiv:2511.15927v11 citationsh-index: 11
Originality Highly original
AI Analysis

This addresses the problem of slow text generation for users of diffusion language models, offering a practical and scalable improvement.

The paper tackled the inference efficiency bottleneck in diffusion-based language models by introducing DiffuApriel, which uses a bidirectional Mamba backbone to achieve up to 4.4x higher throughput for long sequences while matching Transformer-based performance.

Diffusion-based language models have recently emerged as a promising alternative to autoregressive generation, yet their reliance on Transformer backbones limits inference efficiency due to quadratic attention and KV-cache overhead. In this work, we introduce DiffuApriel, a masked diffusion language model built on a bidirectional Mamba backbone that combines the diffusion objective with linear-time sequence modeling. DiffuApriel matches the performance of Transformer-based diffusion models while achieving up to 4.4x higher inference throughput for long sequences with a 1.3B model. We further propose DiffuApriel-H, a hybrid variant that interleaves attention and mamba layers, offering up to 2.6x throughput improvement with balanced global and local context modeling. Our results demonstrate that bidirectional state-space architectures serve as strong denoisers in masked diffusion LMs, providing a practical and scalable foundation for faster, memory-efficient text generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes