CL AIMay 30

WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu, Xiaojie Li, Jungang Lou, Zechao Li, Jing Liu

arXiv:2606.0072478.4h-index: 2

Predicted impact top 74% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For diffusion LLMs facing computational bottlenecks in long-context tasks, WaveFilter provides a training-free, plug-and-play solution that enhances performance of existing KV cache methods.

WaveFilter introduces a wavelet-guided KV cache filtering method for diffusion LLMs, achieving precise token identification in long contexts. It significantly improves generation quality and reduces computational overhead without training, outperforming existing KV cache methods on long-context tasks.

Diffusion Large Language Models (DLMs) have demonstrated significant advantages across various tasks. However, constrained by their multi-step iterative inference mechanism, their computational overhead and inference latency in long-context tasks have become core bottlenecks restricting their large-scale deployment. When processing long sequences, existing Key-Value (KV) caching mechanisms often face a dilemma where generation quality degrades drastically, where the core challenge lies in precisely and efficiently filtering critical tokens within ultra-long contexts. Inspired by the human reading process, we propose \textbf{WaveFilter}, a universal and training-free caching framework. This framework innovatively introduces the wavelet transform for decomposition of long sequences to achieve precise identification of key tokens, based on which a sparse KV Cache is constructed to compute the final contextual representation. Experimental results demonstrate that WaveFilter, as a plug-and-play generic framework, significantly enhances the performance of existing mainstream KV Cache methods in complex long-context tasks.

View on arXiv PDF

Similar