CVJan 29

Vision KAN: Towards an Attention-Free Backbone for Vision with Kolmogorov-Arnold Networks

arXiv:2601.21541v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of high computational cost and lack of clarity in vision models for researchers and practitioners, offering an incremental improvement by replacing attention modules with a more efficient alternative.

The paper tackles the scalability and interpretability issues of attention mechanisms in vision backbones by introducing Vision KAN (ViK), an attention-free backbone using Kolmogorov-Arnold Networks, achieving competitive accuracy on ImageNet-1K with linear complexity.

Attention mechanisms have become a key module in modern vision backbones due to their ability to model long-range dependencies. However, their quadratic complexity in sequence length and the difficulty of interpreting attention weights limit both scalability and clarity. Recent attention-free architectures demonstrate that strong performance can be achieved without pairwise attention, motivating the search for alternatives. In this work, we introduce Vision KAN (ViK), an attention-free backbone inspired by the Kolmogorov-Arnold Networks. At its core lies MultiPatch-RBFKAN, a unified token mixer that combines (a) patch-wise nonlinear transform with Radial Basis Function-based KANs, (b) axis-wise separable mixing for efficient local propagation, and (c) low-rank global mapping for long-range interaction. Employing as a drop-in replacement for attention modules, this formulation tackles the prohibitive cost of full KANs on high-resolution features by adopting a patch-wise grouping strategy with lightweight operators to restore cross-patch dependencies. Experiments on ImageNet-1K show that ViK achieves competitive accuracy with linear complexity, demonstrating the potential of KAN-based token mixing as an efficient and theoretically grounded alternative to attention.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes