CLAISep 2, 2025

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off

arXiv:2509.02785v233 citationsh-index: 12EMNLP
Originality Highly original
AI Analysis

This addresses the computational bottleneck in long-text generation for NLP applications, representing a strong incremental improvement over existing methods.

The paper tackles the efficiency-quality trade-off in long-text generation by introducing DrDiff, a framework that achieves state-of-the-art results through dynamic expert scheduling, hierarchical sparse attention, and soft absorption guidance optimization.

This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates computational resources during the diffusion process based on text complexity, enabling more efficient handling of text generation tasks of varying difficulty. Second, we introduce a Hierarchical Sparse Attention (HSA) mechanism that adaptively adjusts attention patterns according to a variety of input lengths, reducing computational complexity from O($n^2$) to O($n$) while maintaining model performance. Finally, we propose a soft absorption guidance optimization strategy that combines with DPM-solver++ to reduce diffusion steps, significantly improving generation speed. Comprehensive experiments on various long-text generation benchmarks demonstrate the superiority of our DrDiff over the existing SOTA methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes