Geometry and Optimization of Shallow Polynomial Networks
This work provides theoretical insights into the optimization behavior of polynomial networks, which is incremental but useful for researchers in machine learning theory.
The paper tackles the optimization landscape of shallow neural networks with polynomial activations, particularly focusing on quadratic networks, and presents a variation of the Eckart-Young Theorem to characterize critical points and Hessian signatures for teacher-student problems with Gaussian data.
We study shallow neural networks with polynomial activations. The function space for these models can be identified with a set of symmetric tensors with bounded rank. We describe general features of these networks, focusing on the relationship between width and optimization. We then consider teacher-student problems, that can be viewed as a problem of low-rank tensor approximation with respect to a non-standard inner product that is induced by the data distribution. In this setting, we introduce a teacher-metric discriminant which encodes the qualitative behavior of the optimization as a function of the training data distribution. Finally, we focus on networks with quadratic activations, presenting an in-depth analysis of the optimization landscape. In particular, we present a variation of the Eckart-Young Theorem characterizing all critical points and their Hessian signatures for teacher-student problems with quadratic networks and Gaussian training data.