MLLGOCJul 9, 2019

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

arXiv:1907.04371v575 citations
Originality Incremental advance
AI Analysis

This work addresses optimization efficiency for machine learning practitioners by offering an incremental improvement over existing stochastic gradient descent methods.

The authors tackled empirical risk minimization in machine learning by proposing a new stochastic optimization framework that biases gradient estimators toward observations with higher current losses, resulting in consistent improvements in test errors across SVM, logistic regression, and deep learning models compared to standard mini-batch SGD.

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes