ML LG OCDec 29, 2024

Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

arXiv:2412.20556v112.06 citationsh-index: 2

Originality Highly original

AI Analysis

This work addresses a theoretical gap in distributionally robust optimization for machine learning applications, providing convergence guarantees for algorithms handling continuous probability spaces.

The paper tackles the computational challenges of distributionally robust optimization with continuous worst-case distributions by presenting an iterative algorithm that achieves global convergence under mild assumptions, leveraging tools from vector space minimax optimization and convex analysis.

We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using neural network-based generative models to address these computational challenges but lacks algorithmic convergence guarantees. This paper bridges this theoretical gap by presenting an iterative algorithm to solve such a minimax problem, achieving global convergence under mild assumptions and leveraging technical tools from vector space minimax optimization and convex analysis in the space of continuous probability densities. In particular, leveraging Brenier's theorem, we represent the worst-case distribution as a transport map applied to a continuous reference measure and reformulate the regularized discrepancy-based DRO as a minimax problem in the Wasserstein space. Furthermore, we demonstrate that the worst-case distribution can be efficiently computed using a modified Jordan-Kinderlehrer-Otto (JKO) scheme with sufficiently large regularization parameters for commonly used discrepancy functions, linked to the radius of the ambiguity set. Additionally, we derive the global convergence rate and quantify the total number of subgradient and inexact modified JKO iterations required to obtain approximate stationary points. These results are potentially applicable to nonconvex and nonsmooth scenarios, with broad relevance to modern machine learning applications.

View on arXiv PDF

Similar