QLABGrad: a Hyperparameter-Free and Convergence-Guaranteed Scheme for Deep Learning
This addresses the need for automated hyperparameter tuning in deep learning, reducing reliance on empirical try-and-error, though it is incremental as it builds on existing gradient descent methods.
The paper tackles the problem of manually tuning learning rates in deep learning by proposing QLABGrad, a hyperparameter-free scheme that automatically determines learning rates using a quadratic loss approximation, requiring only one extra forward propagation. It theoretically proves convergence under smooth Lipschitz conditions and experimentally shows it outperforms competing schemes on datasets like MNIST, CIFAR10, and ImageNet with architectures such as MLP, CNN, and ResNet.
The learning rate is a critical hyperparameter for deep learning tasks since it determines the extent to which the model parameters are updated during the learning course. However, the choice of learning rates typically depends on empirical judgment, which may not result in satisfactory outcomes without intensive try-and-error experiments. In this study, we propose a novel learning rate adaptation scheme called QLABGrad. Without any user-specified hyperparameter, QLABGrad automatically determines the learning rate by optimizing the Quadratic Loss Approximation-Based (QLAB) function for a given gradient descent direction, where only one extra forward propagation is required. We theoretically prove the convergence of QLABGrad with a smooth Lipschitz condition on the loss function. Experiment results on multiple architectures, including MLP, CNN, and ResNet, on MNIST, CIFAR10, and ImageNet datasets, demonstrate that QLABGrad outperforms various competing schemes for deep learning.