A Note on Connectivity of Sublevel Sets in Deep Learning
This addresses theoretical understanding of optimization landscapes in deep learning, but is incremental as it builds on prior connectivity results.
The paper proves that for deep neural networks, a single wide layer with width N+1 ensures connectivity of sublevel sets of the training loss function, while in two-layer networks, width N can lead to disconnected sets.
It is shown that for deep neural networks, a single wide layer of width $N+1$ ($N$ being the number of training samples) suffices to prove the connectivity of sublevel sets of the training loss function. In the two-layer setting, the same property may not hold even if one has just one neuron less (i.e. width $N$ can lead to disconnected sublevel sets).