LGMar 29, 2025

Function Fitting Based on Kolmogorov-Arnold Theorem and Kernel Functions

arXiv:2503.23038v11 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency in transformer models for computer vision, offering a parameter reduction method that is incremental but with specific gains.

The paper tackles the problem of reducing parameter count in self-attention mechanisms by proposing a kernel-based feature fitting framework that unifies Kolmogorov-Arnold Networks and self-attention, resulting in a Pseudo-Multi-Head Self-Attention module that reduces parameters by nearly 50% while achieving comparable performance to ViT on CIFAR-10.

This paper proposes a unified theoretical framework based on the Kolmogorov-Arnold representation theorem and kernel methods. By analyzing the mathematical relationship among kernels, B-spline basis functions in Kolmogorov-Arnold Networks (KANs) and the inner product operation in self-attention mechanisms, we establish a kernel-based feature fitting framework that unifies the two models as linear combinations of kernel functions. Under this framework, we propose a low-rank Pseudo-Multi-Head Self-Attention module (Pseudo-MHSA), which reduces the parameter count of traditional MHSA by nearly 50\%. Furthermore, we design a Gaussian kernel multi-head self-attention variant (Gaussian-MHSA) to validate the effectiveness of nonlinear kernel functions in feature extraction. Experiments on the CIFAR-10 dataset demonstrate that Pseudo-MHSA model achieves performance comparable to the ViT model of the same dimensionality under the MAE framework and visualization analysis reveals their similarity of multi-head distribution patterns. Our code is publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes