MLCCLGNov 25, 2025

Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

arXiv:2511.20888v12 citations
Originality Highly original
AI Analysis

This provides a theoretical explanation for the success of deep learning over traditional methods, though it is incremental in linking ResNets to circuit complexity.

The paper argues that deep neural networks, particularly ResNets, implement a computational Occam's razor by finding the simplest algorithm that fits data, and shows that minimizing a ResNet norm is equivalent to finding a circuit that fits data with an almost minimal number of nodes, within a power of 2 of optimal.

This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $ε$-approximated with a binary circuit of size at most $cε^{-γ}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $γ>2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $\ell_1$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes