Stochastic Difference-of-Convex Optimization with Momentum
This addresses a practical limitation in machine learning applications where small batch sizes are common, though it appears incremental as it builds on existing momentum techniques.
The paper tackles the problem of stochastic difference-of-convex optimization under small batch sizes, where existing methods require large batches or strong noise assumptions. It shows that momentum enables convergence under standard assumptions for any batch size, with provable convergence and strong empirical performance.
Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.