MLLGSTJun 18, 2020

Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping

arXiv:2006.10840v17 citations
Originality Incremental advance
AI Analysis

This work addresses theoretical gaps in SGD for kernel-based learning, with implications for optimization in machine learning, though it appears incremental as it extends existing analysis.

The authors analyzed Stochastic Gradient Descent (SGD) in Hilbert scales for least squares learning in RKHSs, showing that violating traditional smoothness assumptions significantly impacts learning rates, and that preconditioning in Hilbert scales reduces iterations for miss-specified models, enabling earlier stopping.

Stochastic Gradient Descent (SGD) has become the method of choice for solving a broad range of machine learning problems. However, some of its learning properties are still not fully understood. We consider least squares learning in reproducing kernel Hilbert spaces (RKHSs) and extend the classical SGD analysis to a learning setting in Hilbert scales, including Sobolev spaces and Diffusion spaces on compact Riemannian manifolds. We show that even for well-specified models, violation of a traditional benchmark smoothness assumption has a tremendous effect on the learning rate. In addition, we show that for miss-specified models, preconditioning in an appropriate Hilbert scale helps to reduce the number of iterations, i.e. allowing for "earlier stopping".

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes