Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises
This addresses a limitation in decentralized optimization for real-world ML models where assumptions like strong convexity and finite variance often fail, though it is incremental as it extends existing bilevel optimization to heavy-tailed noise settings.
The paper tackles the problem of decentralized stochastic bilevel optimization under heavy-tailed noises, which existing methods assume strong convexity and finite variance, and develops a normalized stochastic variance-reduced bilevel gradient descent algorithm, achieving a convergence rate with rigorous theoretical guarantees and confirming effectiveness through experiments.
Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noises. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noises for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noises. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noises.