On The Convergence of First Order Methods for Quasar-Convex Optimization
This work addresses the optimization challenge for machine learning practitioners by identifying a structured property that enables efficient algorithms, though it is incremental as it builds on known concepts of structured non-convexity.
The paper tackles the gap between theoretical worst-case complexities and practical performance in non-convex optimization by studying quasar-convex functions, proving that first-order methods achieve convergence rates similar to convex functions and better than state-of-the-art non-convex rates.
In recent years, the success of deep learning has inspired many researchers to study the optimization of general smooth non-convex functions. However, recent works have established pessimistic worst-case complexities for this class functions, which is in stark contrast with their superior performance in real-world applications (e.g. training deep neural networks). On the other hand, it is found that many popular non-convex optimization problems enjoy certain structured properties which bear some similarities to convexity. In this paper, we study the class of \textit{quasar-convex functions} to close the gap between theory and practice. We study the convergence of first order methods in a variety of different settings and under different optimality criterions. We prove complexity upper bounds that are similar to standard results established for convex functions and much better that state-of-the-art convergence rates of non-convex functions. Overall, this paper suggests that \textit{quasar-convexity} allows efficient optimization procedures, and we are looking forward to seeing more problems that demonstrate similar properties in practice.