LG AI NADec 22, 2025

Sprecher Networks: A Parameter-Efficient Kolmogorov-Arnold Architecture

Christian Hägg, Kathlén Kohn, Giovanni Luca Marchetti, Boris Shapiro

arXiv:2512.19367v17.11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the challenge of building more efficient neural networks for machine learning practitioners, offering a parameter-efficient alternative to existing architectures like MLPs and KANs, though it appears incremental as it builds on classical constructions and extends them with new components.

The paper tackles the problem of parameter and memory inefficiency in neural networks by introducing Sprecher Networks (SNs), a family of architectures inspired by the Kolmogorov-Arnold-Sprecher construction, which achieve parameter scaling of O(LN + LG) and reduce peak forward-intermediate memory from O(N^2) to O(N), enabling wider architectures under memory constraints.

We present Sprecher Networks (SNs), a family of trainable neural architectures inspired by the classical Kolmogorov-Arnold-Sprecher (KAS) construction for approximating multivariate continuous functions. Distinct from Multi-Layer Perceptrons (MLPs) with fixed node activations and Kolmogorov-Arnold Networks (KANs) featuring learnable edge activations, SNs utilize shared, learnable splines (monotonic and general) within structured blocks incorporating explicit shift parameters and mixing weights. Our approach directly realizes Sprecher's specific 1965 sum of shifted splines formula in its single-layer variant and extends it to deeper, multi-layer compositions. We further enhance the architecture with optional lateral mixing connections that enable intra-block communication between output dimensions, providing a parameter-efficient alternative to full attention mechanisms. Beyond parameter efficiency with $O(LN + LG)$ scaling (where $G$ is the knot count of the shared splines) versus MLPs' $O(LN^2)$, SNs admit a sequential evaluation strategy that reduces peak forward-intermediate memory from $O(N^2)$ to $O(N)$ (treating batch size as constant), making much wider architectures feasible under memory constraints. We demonstrate empirically that composing these blocks into deep networks leads to highly parameter and memory-efficient models, discuss theoretical motivations, and compare SNs with related architectures (MLPs, KANs, and networks with learnable node activations).

View on arXiv PDF

Similar