Stochastic Gradient Sampling for Enhancing Neural Networks Training
This provides a more efficient optimization method for deep learning practitioners working with large-scale models and datasets, though it appears incremental as an extension of Adam.
The paper tackles the computational inefficiency of neural network training by introducing StochGradAdam, an optimizer that selectively samples gradients to reduce computational cost while maintaining performance comparable to or better than Adam on image classification and segmentation tasks.
In this paper, we introduce StochGradAdam, a novel optimizer designed as an extension of the Adam algorithm, incorporating stochastic gradient sampling techniques to improve computational efficiency while maintaining robust performance. StochGradAdam optimizes by selectively sampling a subset of gradients during training, reducing the computational cost while preserving the advantages of adaptive learning rates and bias corrections found in Adam. Our experimental results, applied to image classification and segmentation tasks, demonstrate that StochGradAdam can achieve comparable or superior performance to Adam, even when using fewer gradient updates per iteration. By focusing on key gradient updates, StochGradAdam offers stable convergence and enhanced exploration of the loss landscape, while mitigating the impact of noisy gradients. The results suggest that this approach is particularly effective for large-scale models and datasets, providing a promising alternative to traditional optimization techniques for deep learning applications.