The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning
This is an incremental theoretical insight for neural network researchers, linking foundational mathematics to modern deep learning mechanisms.
The paper interprets the Kolmogorov-Arnold theorem as suggesting sparsity in Jacobians of neural network layers, which may explain the emergence of higher-order concepts in deep learning, and proposes experimental tests for this hypothesis.
Kolmogorov and Arnold, in answering Hilbert's 13th problem (in the context of continuous functions), laid the foundations for the modern theory of Neural Networks (NNs). Their proof divides the representation of a multivariate function into two steps: The first (non-linear) inter-layer map gives a universal embedding of the data manifold into a single hidden layer whose image is patterned in such a way that a subsequent dynamic can then be defined to solve for the second inter-layer map. I interpret this pattern as "minor concentration" of the almost everywhere defined Jacobians of the interlayer map. Minor concentration amounts to sparsity for higher exterior powers of the Jacobians. We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs and suggest two classes of experiments to test this hypothesis.