LGAICVMLJun 14, 2018

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

arXiv:1806.05594v3261 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and stable training in semi-supervised learning for computer vision tasks, offering incremental improvements over existing methods.

The paper tackles the problem of improving semi-supervised learning with consistency regularization by showing that SGD struggles to converge on consistency loss, leading to unstable predictions, and proposes using Stochastic Weight Averaging (SWA) and fast-SWA to achieve state-of-the-art results, such as 5.0% error on CIFAR-10 with 4000 labels compared to the previous best of 6.3%.

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With weight averaging, we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of labeled training data. For example, we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 6.3%.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes