LGCVMLJun 17, 2020

An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives

arXiv:2006.10138v556 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency and scalability issues in robust machine learning for applications like neural networks, though it is incremental as it builds on existing DRO methods.

The paper tackles the problem of slow and inefficient training for distributionally robust optimization (DRO) with non-convex objectives, particularly in online settings, by proposing a duality-free online stochastic method that speeds up training by more than 2 times and saves days on large-scale datasets with ~265K images.

In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the min-max problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-Łojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with $\sim$ 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes