LG AGJan 10, 2025

Geometry and Optimization of Shallow Polynomial Networks

Yossi Arjevani, Joan Bruna, Joe Kileel, Elzbieta Polak, Matthew Trager

arXiv:2501.06074v116.97 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work provides theoretical insights into the optimization behavior of polynomial networks, which is incremental but useful for researchers in machine learning theory.

The paper tackles the optimization landscape of shallow neural networks with polynomial activations, particularly focusing on quadratic networks, and presents a variation of the Eckart-Young Theorem to characterize critical points and Hessian signatures for teacher-student problems with Gaussian data.

We study shallow neural networks with polynomial activations. The function space for these models can be identified with a set of symmetric tensors with bounded rank. We describe general features of these networks, focusing on the relationship between width and optimization. We then consider teacher-student problems, that can be viewed as a problem of low-rank tensor approximation with respect to a non-standard inner product that is induced by the data distribution. In this setting, we introduce a teacher-metric discriminant which encodes the qualitative behavior of the optimization as a function of the training data distribution. Finally, we focus on networks with quadratic activations, presenting an in-depth analysis of the optimization landscape. In particular, we present a variation of the Eckart-Young Theorem characterizing all critical points and their Hessian signatures for teacher-student problems with quadratic networks and Gaussian training data.

View on arXiv PDF

Similar