LGAICVMLFeb 17, 2022

General Cyclical Training of Neural Networks

arXiv:2202.08835v28 citationsHas Code
AI Analysis

This work proposes a novel training paradigm for machine learning practitioners, but it appears incremental as it builds on existing cyclical methods without showing broad SOTA gains.

The paper tackles the problem of improving neural network training by introducing the principle of General Cyclical Training, where training phases alternate between easy and hard epochs, and demonstrates that techniques like cyclical weight decay, softmax temperature, and gradient clipping enhance test accuracy.

This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such as pretraining and knowledge distillation) from the perspective of general cyclical training and recommend some changes to the typical training methodology. In summary, this paper defines the general cyclical training concept and discusses several specific ways in which this concept can be applied to training neural networks. In the spirit of reproducibility, the code used in our experiments is available at \url{https://github.com/lnsmith54/CFL}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes