Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization
This work addresses robust model training against data distribution shifts for machine learning applications, presenting an incremental improvement in algorithm design for large-scale DRO problems.
The paper tackles regularized nonconvex distributionally robust optimization (DRO) problems with generalized Sinkhorn distance uncertainty sets, deriving a dual formulation as nested stochastic optimization and proposing a nested stochastic gradient descent algorithm with polynomial iteration and sample complexities independent of data size and parameter dimension.
Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic optimization, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.