OCLGMLSep 22, 2021

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

arXiv:2109.10933v2
AI Analysis

This work provides theoretical insights for researchers optimizing SGD methods, but it is incremental as it builds on existing strategies without introducing new algorithms.

The paper demonstrates the equivalence of two adaptive batch size selection strategies (norm test and inner product/orthogonality test) for Stochastic Gradient Descent in terms of convergence rates under a specific condition, and shows that the inner product/orthogonality test can be as inexpensive as the norm test in optimal cases but never more affordable.

In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if $ε^2=θ^2+ν^2$ with specific choices of $θ$ and $ν$. Here, $ε$ controls the relative statistical error of the norm of the gradient while $θ$ and $ν$ control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if $θ$ and $ν$ are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if $ε^2=θ^2+ν^2$. Finally, we present two stochastic optimization problems to illustrate our results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes