LGJan 3, 2022

Stochastic Weight Averaging Revisited

arXiv:2201.00519v437 citationsHas Code
AI Analysis

This work provides incremental insights into optimization methods for deep neural networks, benefiting researchers in machine learning by clarifying the mechanisms of weight averaging.

The paper investigates Stochastic Weight Averaging (SWA) by disentangling the effects of weight averaging and learning rate schedules, finding that weight averaging reduces variance but does not always lead to wider optima, and proposes Periodic SWA (PSWA) to exploit global geometric structures in the loss landscape, outperforming backbone SGD.

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averaging (WA) contributes to variance reduction. Recently, a well-established stochastic weight averaging (SWA) method is proposed, which is featured by the application of a cyclical or high constant (CHC) learning rate schedule (LRS) in generating weight samples for WA. Then a new insight on WA appears, which states that WA helps to discover wider optima and then leads to better generalization. We conduct extensive experimental studies for SWA, involving a dozen modern DNN model structures and a dozen benchmark open-source image, graph, and text datasets. We disentangle contributions of the WA operation and the CHC LRS for SWA, showing that the WA operation in SWA still contributes to variance reduction but does not always lead to wide optima. The experimental results indicate that there are global scale geometric structures in the DNN loss landscape. We then present an algorithm termed periodic SWA (PSWA) which makes use of a series of WA operations to discover the global geometric structures. PSWA outperforms its backbone SGD remarkably, providing experimental evidences for the existence of global geometric structures. Codes for reproducing the experimental results are available at https://github.com/ZJLAB-AMMI/PSWA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes