Approximation Rates in Besov Norms and Sample-Complexity of Kolmogorov-Arnold Networks with Residual Connections
This work offers theoretical guarantees for KANs in approximating and learning smooth functions, which is incremental but important for understanding their capabilities in deep learning frameworks.
The paper proves that residual Kolmogorov-Arnold Networks (KANs) can optimally approximate any Besov function at the best possible rate in weaker Besov norms, and provides a dimension-free sample complexity bound for learning such functions from noiseless data.
Inspired by the Kolmogorov-Arnold superposition theorem, Kolmogorov-Arnold Networks (KANs) have recently emerged as an improved backbone for most deep learning frameworks, promising more adaptivity than their multilayer perceptron (MLP) predecessor by allowing for trainable spline-based activation functions. In this paper, we probe the theoretical foundations of the KAN architecture by showing that it can optimally approximate any Besov function in $B^{s}_{p,q}(\mathcal{X})$ on a bounded open, or even fractal, domain $\mathcal{X}$ in $\mathbb{R}^d$ at the optimal approximation rate with respect to any weaker Besov norm $B^α_{p,q}(\mathcal{X})$; where $α< s$. We complement our approximation result with a statistical guarantee by bounding the pseudodimension of the relevant class of Res-KANs. As an application of the latter, we directly deduce a dimension-free estimate on the sample complexity of a residual KAN model when learning a function of Besov regularity from $N$ i.i.d. noiseless samples, showing that KANs can learn the smooth maps which they can approximate.