MLLGOCJun 14, 2018

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

arXiv:1806.05438v410 citations
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck in binary classification by improving convergence rates for SGD, which is incremental as it extends prior results from squared loss to a broader class of loss functions.

The paper tackles the problem of slow convergence rates for expected classification errors in stochastic gradient descent (SGD) for binary classification, showing that exponential convergence can be achieved for a wide class of differentiable convex loss functions under low-noise conditions, with experiments verifying this on L2-regularized logistic regression.

We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes