MLLGMar 20, 2019

Traversing the noise of dynamic mini-batch sub-sampled loss functions: A visual guide

arXiv:1903.08552v23 citations
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in automating neural network training for practitioners dealing with large datasets and limited computational resources, though it appears incremental as it builds on existing optimization concepts.

The paper tackles the problem of mini-batch sub-sampling in neural network training, which introduces noise and discontinuities that hinder optimization methods like line searches, and proposes using stochastic non-negative associated gradient projection points (SNN-GPPs) as a more robust optimality criterion, demonstrating that SNN-GPPs better approximate true optima, especially with smooth activation functions.

Mini-batch sub-sampling in neural network training is unavoidable, due to growing data demands, memory-limited computational resources such as graphical processing units (GPUs), and the dynamics of on-line learning. In this study we specifically distinguish between static mini-batch sub-sampled loss functions, where mini-batches are intermittently fixed during training, resulting in smooth but biased loss functions; and the dynamic sub-sampling equivalent, where new mini-batches are sampled at every loss evaluation, trading bias for variance in sampling induced discontinuities. These render automated optimization strategies such as minimization line searches ineffective, since critical points may not exist and function minimizers find spurious, discontinuity induced minima. This paper suggests recasting the optimization problem to find stochastic non-negative associated gradient projection points (SNN-GPPs). We demonstrate that the SNN-GPP optimality criterion is less susceptible to sub-sampling induced discontinuities than critical points or minimizers. We conduct a visual investigation, comparing local minimum and SNN-GPP optimality criteria in the loss functions of a simple neural network training problem for a variety of popular activation functions. Since SNN-GPPs better approximate the location of true optima, particularly when using smooth activation functions with high curvature characteristics, we postulate that line searches locating SNN-GPPs can contribute significantly to automating neural network training

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes