LGGTSYMLSep 30, 2020

Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

arXiv:2009.14820v116 citations
Originality Incremental advance
AI Analysis

This work addresses convergence issues in adversarial training for machine learning, providing theoretical guarantees that bridge a gap between existing edge cases, though it is incremental as it builds on prior analyses.

The paper tackles the problem of convergence in two-player non-convex, non-concave zero-sum games using gradient descent-ascent, showing that a finite timescale separation parameter ensures convergence to strict local minmax equilibria, with explicit construction for computing this parameter and empirical validation on datasets like CIFAR-10 and CelebA.

We study the role that a finite timescale separation parameter $τ$ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $γ_1$ and the learning rate of player 2 is defined to be $γ_2=τγ_1$. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate ($τ=1$) and the maximizing player approximately converging between each update of the minimizing player ($τ\rightarrow \infty$). For the parameter choice of $τ=1$, it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as $τ\rightarrow\infty$. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter $τ^{\ast}$ such that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all $τ\in (τ^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing $τ^{\ast}$ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point $x^{\ast}$ that is not a strict local minmax equilibrium, then there exists a finite timescale separation $τ_0$ such that $x^{\ast}$ is unstable for all $τ\in (τ_0, \infty)$. Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes