Adaptive First- and Second-Order Algorithms for Large-Scale Machine Learning
This work addresses optimization challenges in large-scale machine learning, offering incremental improvements for practitioners in deep learning.
The paper tackles continuous optimization in machine learning by proposing adaptive first- and second-order algorithms, including a novel first-order method with adaptive sampling and step size and a stochastic damped L-BFGS method, which show promising performance on deep learning datasets.
In this paper, we consider both first- and second-order techniques to address continuous optimization problems arising in machine learning. In the first-order case, we propose a framework of transition from deterministic or semi-deterministic to stochastic quadratic regularization methods. We leverage the two-phase nature of stochastic optimization to propose a novel first-order algorithm with adaptive sampling and adaptive step size. In the second-order case, we propose a novel stochastic damped L-BFGS method that improves on previous algorithms in the highly nonconvex context of deep learning. Both algorithms are evaluated on well-known deep learning datasets and exhibit promising performance.