LGNEOCMLFeb 9, 2016

Poor starting points in machine learning

arXiv:1602.02823v11 citations
Originality Incremental advance
AI Analysis

This work tackles optimization inefficiencies for practitioners using stochastic methods, but it is incremental as it builds on known acceleration techniques.

The paper addresses the issue of poor starting points in machine learning optimization, showing that Nesterov acceleration can improve initial convergence compared to stochastic gradient descent, especially with minibatches.

Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes