LGAIJun 4

When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet

arXiv:2606.0603448.2
Predicted impact top 20% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the computational bottleneck of matrix inversion in linear attention for long-context models, particularly on NPUs, offering a hardware-friendly solution.

Matrix inversion in chunk-wise parallel linear attention is a major bottleneck for long-context modeling on NPUs. The authors propose a MatMul-based algorithm using truncated Neumann expansion with structural masking and parallel residual correction, achieving up to 5x kernel-level speedup and 20% reduction in decode-layer overhead while preserving accuracy.

Matrix inversion in chunk-wise parallel linear attention is a major bottleneck for long-context modeling, particularly on NPUs, where forward-substitution-based methods exhibit limited parallelism and poor hardware utilization. We propose a fast, Matrix Multiplication (MatMul)-based algorithm tailored for strictly lower-triangular matrices arising in chunk-wise linear attention. Motivated by the rapid growth of Neumann-series terms and the diagonal concentration of the inverse matrix, we employ a truncated Neumann expansion with structural masking and parallel residual correction to eliminate sequential dependencies. We further extend our method to low-bits INT by mitigating the dynamic range expansion arising from repeated matrix power operations, and adapt the approximation order and residual step to the chunk size to minimize computational cost while preserving the model's accuracy. Experiments on Qwen3.5-family models demonstrate up to 5$\times$ kernel-level speedup and a 20% reduction in decode-layer overhead, while preserving accuracy under both floating-point and low-precision inference. Our method offers an efficient and hardware-friendly solution for scalable linear attention.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes