Learning Distributionally Robust Models at Scale via Composite Optimization
This work solves scalability challenges in DRO for practitioners handling large datasets, though it is incremental as it builds on existing DRO frameworks.
The paper tackled the problem of training machine learning models robust to distribution shifts by addressing scalability issues in distributionally robust optimization (DRO), showing that DRO variants can be formulated as finite-sum composite optimization and providing scalable methods that enable learning from very large datasets.
To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective. However, the existing approaches to learning a distributionally robust model either require solving complex optimization problems such as semidefinite programming or a first-order method whose convergence scales linearly with the number of data samples -- which hinders their scalability to large datasets. In this paper, we show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods. We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.