LG NANov 26, 2025

SUPN: Shallow Universal Polynomial Networks

Zachary Morrow, Michael Penwarden, Brian Chen, Aurya Javeed, Akil Narayan, John D. Jakeman

arXiv:2511.21414v14.1h-index: 6

Originality Incremental advance

AI Analysis

This addresses the issue of high parameter counts and training instability for researchers and practitioners in machine learning, though it appears incremental as it builds on existing methods like DNNs and polynomials.

The paper tackles the problem of overparameterization in deep neural networks and Kolmogorov-Arnold networks by proposing shallow universal polynomial networks (SUPNs), which achieve approximation errors and variability often lower by an order of magnitude with fewer parameters, as shown in over 13,000 trained models.

Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.

View on arXiv PDF

Similar