Never Go Full Batch (in Stochastic Convex Optimization)
This provides a theoretical separation result for optimization algorithms, highlighting limitations of full-batch methods in machine learning applications.
The paper tackles the generalization performance of full-batch optimization algorithms in stochastic convex optimization, showing that they require at least Ω(1/ε^4) iterations or have dimension-dependent sample complexity to achieve ε accuracy, whereas stochastic gradient descent can do so in O(1/ε^2) iterations.
We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within $ε$ after $O(1/ε^2)$ iterations, full-batch methods either need at least $Ω(1/ε^4)$ iterations or exhibit a dimension-dependent sample complexity.