LGOCSep 3, 2023

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

arXiv:2309.01248v1Has Code
Originality Incremental advance
AI Analysis

This work addresses optimization challenges in machine learning by enhancing SGD performance, but it is incremental as it builds on existing step size methods.

This paper tackles the problem of improving stochastic gradient descent (SGD) by proposing a modified step size with a logarithmic term, achieving a convergence rate of O(ln T / sqrt(T)) for smooth non-convex functions and accuracy improvements of 0.5% and 1.4% on FashionMNIST and CIFAR10 datasets compared to traditional step sizes.

This paper introduces a novel approach to enhance the performance of the stochastic gradient descent (SGD) algorithm by incorporating a modified decay step size based on $\frac{1}{\sqrt{t}}$. The proposed step size integrates a logarithmic term, leading to the selection of smaller values in the final iterations. Our analysis establishes a convergence rate of $O(\frac{\ln T}{\sqrt{T}})$ for smooth non-convex functions without the Polyak-Łojasiewicz condition. To evaluate the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, and CIFAR10 datasets, and the results demonstrate significant improvements in accuracy, with enhancements of $0.5\%$ and $1.4\%$ observed, respectively, compared to the traditional $\frac{1}{\sqrt{t}}$ step size. The source code can be found at \\\url{https://github.com/Shamaeem/LNSQRTStepSize}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes