LGAug 5, 2025

Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis

arXiv:2508.03105v2h-index: 4

Originality Incremental advance

AI Analysis

This work addresses optimization efficiency for deep learning practitioners by providing theoretical and empirical insights into hyperparameter scheduling, though it is incremental as it builds on existing SGDM frameworks.

The paper tackled the convergence behavior of stochastic gradient descent with momentum (SGDM) under dynamic learning-rate and batch-size schedules, showing that increasing batch size ensures convergence while simultaneously increasing both batch size and learning rate achieves faster decay, with empirical validation of significant speed improvements over fixed-hyperparameter SGDM.

We analyze the convergence behavior of stochastic gradient descent with momentum (SGDM) under dynamic learning-rate and batch-size schedules by introducing a novel and simpler Lyapunov function. We extend the existing theoretical framework to cover three practical scheduling strategies commonly used in deep learning: a constant batch size with a decaying learning rate, an increasing batch size with a decaying learning rate, and an increasing batch size with an increasing learning rate. Our results reveal a clear hierarchy in convergence: a constant batch size does not guarantee convergence of the expected gradient norm, whereas an increasing batch size does, and simultaneously increasing both the batch size and learning rate achieves a provably faster decay. Empirical results validate our theory, showing that dynamically scheduled SGDM significantly outperforms its fixed-hyperparameter counterpart in convergence speed. We also evaluated a warm-up schedule in experiments, which empirically outperformed all other strategies in convergence behavior.

View on arXiv PDF

Similar