Stochastic Primal-Dual Algorithms with Faster Convergence than $O(1/\sqrt{T})$ for Problems without Bilinear Structure
This work addresses a bottleneck in optimization for machine learning by enabling faster convergence in a broader class of min-max problems, though it is incremental as it builds on existing stochastic methods.
The paper tackles the limitation of stochastic primal-dual algorithms that require bilinear structure for faster convergence, proposing new algorithms that achieve a convergence rate of O(1/T) for convex-concave problems without this assumption, such as in robust model learning and AUC maximization.
Previous studies on stochastic primal-dual algorithms for solving min-max problems with faster convergence heavily rely on the bilinear structure of the problem, which restricts their applicability to a narrowed range of problems. The main contribution of this paper is the design and analysis of new stochastic primal-dual algorithms that use a mixture of stochastic gradient updates and a logarithmic number of deterministic dual updates for solving a family of convex-concave problems with no bilinear structure assumed. Faster convergence rates than $O(1/\sqrt{T})$ with $T$ being the number of stochastic gradient updates are established under some mild conditions of involved functions on the primal and the dual variable. For example, for a family of problems that enjoy a weak strong convexity in terms of the primal variable and has a strongly concave function of the dual variable, the convergence rate of the proposed algorithm is $O(1/T)$. We also investigate the effectiveness of the proposed algorithms for learning robust models and empirical AUC maximization.