Function Space and Critical Points of Linear Convolutional Networks
This provides theoretical insights into optimization properties for researchers in deep learning theory, but is incremental as it builds on prior work on network geometry.
The paper tackles the geometry of linear convolutional networks by analyzing their function spaces as semi-algebraic families of polynomials with sparse factorizations, and proves that for architectures with strides larger than one and generic data, non-zero critical points in training are smooth interior points, unlike in dense networks or stride-one cases.
We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one.