LG ME MLOct 30, 2023

Flow-based Distributionally Robust Optimization

Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie

Georgia Tech

arXiv:2310.19253v412.312 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses the scalability and generalization issues in distributionally robust optimization for researchers and practitioners in machine learning, though it is incremental as it builds on existing flow-based and DRO methods.

The authors tackled the computational challenge of solving flow-based distributionally robust optimization problems with Wasserstein uncertainty sets by developing FlowDRO, a framework that leverages flow-based models and a Wasserstein proximal gradient flow algorithm to find continuous worst-case distributions efficiently, achieving strong empirical performance on high-dimensional real data in applications like adversarial learning and differential privacy.

We present a computationally efficient framework, called $\texttt{FlowDRO}$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution and develop a Wasserstein proximal gradient flow type algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on high-dimensional real data.

View on arXiv PDF Code

Similar