Exploring the loss landscape of regularized neural networks via convex duality
This provides theoretical insights into optimization challenges in neural networks, which is incremental but useful for researchers in machine learning theory.
The paper tackles the problem of understanding the loss landscape of regularized neural networks by using convex duality to characterize stationary points, connectivity of optimal solutions, and nonuniqueness, showing that global optima undergo a phase transition with network width and constructing counterexamples with continua of optimal solutions.
We discuss several aspects of the loss landscape of regularized neural networks: the structure of stationary points, connectivity of optimal solutions, path with nonincreasing loss to arbitrary global optimum, and the nonuniqueness of optimal solutions, by casting the problem into an equivalent convex problem and considering its dual. Starting from two-layer neural networks with scalar output, we first characterize the solution set of the convex problem using its dual and further characterize all stationary points. With the characterization, we show that the topology of the global optima goes through a phase transition as the width of the network changes, and construct counterexamples where the problem may have a continuum of optimal solutions. Finally, we show that the solution set characterization and connectivity results can be extended to different architectures, including two-layer vector-valued neural networks and parallel three-layer neural networks.