82.6NAApr 2
Attention Mechanisms Through the Lens of Numerical Methods: Approximation Methods and Alternative FormulationsMichel Fabrice Serret, Alice Cortinovis, Yijun Dong et al.
The attention mechanism is the computational core of modern Transformer architectures, but its quadratic complexity in the input sequence length is the bottleneck for large-scale inference. This has motivated a rapidly growing body of work aimed at accelerating attention through approximation and reformulation. In this survey, we revisit attention mechanisms through the lens of numerical analysis, with a particular emphasis on tools and perspectives from numerical linear algebra. Our goal is twofold: first, we aim to systematically review and classify fast approximation methods according to the numerical principles they exploit. These include sparsity and clustering approaches, low-rank and subspace projection techniques, randomized sketching methods, and tensor-based decompositions. We also discuss kernel-inspired reformulations of attention and recent architectural variants, such as Latent Attention, that modify the standard softmax formulation to improve efficiency. Second, by presenting these developments within a unified mathematical framework, we aim to bridge the gap between disciplines and highlight opportunities for further contributions from computational mathematics, particularly numerical linear algebra, to the design of scalable attention mechanisms.
10.3NAMar 13
Efficient Sketching-Based Summation of Tucker TensorsRudi Smith, Mirjeta Pasha, Andrés Galindo-Olarte et al.
We present efficient, sketching-based methods for the summation of tensors in Tucker format. Leveraging the algebraic structure of Khatri-Rao and Kronecker products, our approach enables compressed arithmetic on Tucker tensors while controlling rank growth and computational cost. The proposed sketching framework avoids the explicit formation of large intermediate tensors, instead operating directly on the factor matrices and core tensors to produce accurate low-rank approximations of tensor sums. Furthermore, we analyze the computational complexity and the theoretical approximation properties of the proposed methodology. Numerical experiments demonstrate the effectiveness of our approach on four problems: two synthetic test cases, a parameter-dependent elliptic equation (commonly referred to as the cookie problem) solved via GMRES, and a one-dimensional linear transport problem discretized via high-order discontinuous Galerkin methods, where repeated tensor summation arises as a core computational bottleneck. Across these examples, the sketching-based summation achieves substantial computational savings while preserving high accuracy relative to direct summation and re-compression.