OCLGMLJan 19, 2022

Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization

arXiv:2201.07427v133 citations
Originality Highly original
AI Analysis

This solves a fundamental gap in optimization theory for researchers and practitioners in machine learning, providing an optimal algorithm for a common problem structure.

The paper tackles the bilinearly coupled minimax optimization problem by developing the Lifted Primal-Dual (LPD) method, which achieves the first optimal iteration complexity matching the lower bound, with numerical experiments showing fast convergence.

We study the bilinearly coupled minimax problem: $\min_{x} \max_{y} f(x) + y^\top A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions and admit first-order gradient oracles. Surprisingly, no known first-order algorithms have hitherto achieved the lower complexity bound of $Ω((\sqrt{\frac{L_x}{μ_x}} + \frac{\|A\|}{\sqrt{μ_x μ_y}} + \sqrt{\frac{L_y}{μ_y}}) \log(\frac1{\varepsilon}))$ for solving this problem up to an $\varepsilon$ primal-dual gap in the general parameter regime, where $L_x, L_y,μ_x,μ_y$ are the corresponding smoothness and strongly convexity constants. We close this gap by devising the first optimal algorithm, the Lifted Primal-Dual (LPD) method. Our method lifts the objective into an extended form that allows both the smooth terms and the bilinear term to be handled optimally and seamlessly with the same primal-dual framework. Besides optimality, our method yields a desirably simple single-loop algorithm that uses only one gradient oracle call per iteration. Moreover, when $f$ is just convex, the same algorithm applied to a smoothed objective achieves the nearly optimal iteration complexity. We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{μ_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$. Numerical experiments on quadratic minimax problems and policy evaluation problems further demonstrate the fast convergence of our algorithm in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes