Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
This addresses a fundamental issue in understanding the geometry of deep learning loss landscapes for researchers and practitioners, offering insights into generalization properties, though it is incremental as it builds on prior work on mode connectivity.
The paper tackles the problem of explaining why low-cost solutions in multilayer neural networks are connected in the loss landscape, showing that optima discovered by gradient-based methods are linked by simple paths with nearly constant loss, often using piece-wise linear segments. It provides mathematical explanations based on generic properties like dropout and noise stability, with experimental verification.
Mode connectivity is a surprising phenomenon in the loss landscape of deep nets. Optima -- at least those discovered by gradient-based optimization -- turn out to be connected by simple paths on which the loss function is almost constant. Often, these paths can be chosen to be piece-wise linear, with as few as two segments. We give mathematical explanations for this phenomenon, assuming generic properties (such as dropout stability and noise stability) of well-trained deep nets, which have previously been identified as part of understanding the generalization properties of deep nets. Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory.