LGDec 4, 2018

Parameter Re-Initialization through Cyclical Batch Size Schedules

arXiv:1812.01216v18 citations
Originality Incremental advance
AI Analysis

This addresses the issue of suboptimal training efficiency and performance for neural network practitioners, though it appears incremental as it builds on existing initialization and scheduling techniques.

The paper tackles the problem of poor neural network weight initialization by proposing a method of weight re-initialization through cyclical batch size schedules, resulting in improvements such as up to 7.91 perplexity reduction in language modeling and up to 61% reduction in training iterations.

Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by repeated annealing and injection of noise in the training process. We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training. We evaluate our methods through extensive experiments on tasks in language modeling, natural language inference, and image classification. We demonstrate the ability of our method to improve language modeling performance by up to 7.91 perplexity and reduce training iterations by up to $61\%$, in addition to its flexibility in enabling snapshot ensembling and use with adversarial training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes