Low-loss connection of weight vectors: distribution-based approaches
This work addresses a theoretical challenge in understanding neural network optimization landscapes, but it appears incremental as it compares existing connection methods without introducing a fundamentally new approach.
The paper tackles the problem of connecting two low-loss points on the loss surfaces of overparameterized neural networks with low-loss curves, comparing various methods based on distributional assumptions, where accuracy generally correlates with complexity and sensitivity to endpoint details.
Recent research shows that sublevel sets of the loss surfaces of overparameterized networks are connected, exactly or approximately. We describe and compare experimentally a panel of methods used to connect two low-loss points by a low-loss curve on this surface. Our methods vary in accuracy and complexity. Most of our methods are based on "macroscopic" distributional assumptions, and some are insensitive to the detailed properties of the points to be connected. Some methods require a prior training of a "global connection model" which can then be applied to any pair of points. The accuracy of the method generally correlates with its complexity and sensitivity to the endpoint detail.