LGOCMar 30, 2022

A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization

arXiv:2203.16615v212 citations
AI Analysis

This work addresses a bottleneck in machine learning for applications like hyper-parameter optimization by providing a faster and convergent algorithm for nonconvex and nonsmooth bi-level problems, though it is incremental as it builds on existing methods like AID.

The paper tackles the problem of regularized nonconvex and nonsmooth bi-level optimization in machine learning, which existing gradient-based algorithms cannot handle efficiently, by proposing a proximal gradient-type algorithm with approximate implicit differentiation and Nesterov's momentum, achieving an improved computation complexity of O(κ^{3.5}ε^{-2}) and demonstrating effectiveness in hyper-parameter optimization experiments.

Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(κ^{3.5}ε^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a Łojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes