LGJun 10, 2017

Critical Hyper-Parameters: No Random, No Cry

arXiv:1706.03200v145 citations
Originality Incremental advance
AI Analysis

This addresses the inefficiency of grid and random search for hyperparameter optimization in deep learning, offering a simple drop-in replacement that can reduce computational costs, though it is incremental as it builds on existing quasi-random methods.

The paper tackles the problem of hyperparameter selection in deep learning by proposing a quasi-random method using Low Discrepancy Sequences, which yields suitable hyperparameter values with much fewer runs than random search, as demonstrated on state-of-the-art LSTM language models and image classification models.

The selection of hyper-parameters is critical in Deep Learning. Because of the long training time of complex models and the availability of compute resources in the cloud, "one-shot" optimization schemes - where the sets of hyper-parameters are selected in advance (e.g. on a grid or in a random manner) and the training is executed in parallel - are commonly used. It is known that grid search is sub-optimal, especially when only a few critical parameters matter, and suggest to use random search instead. Yet, random search can be "unlucky" and produce sets of values that leave some part of the domain unexplored. Quasi-random methods, such as Low Discrepancy Sequences (LDS) avoid these issues. We show that such methods have theoretical properties that make them appealing for performing hyperparameter search, and demonstrate that, when applied to the selection of hyperparameters of complex Deep Learning models (such as state-of-the-art LSTM language models and image classification models), they yield suitable hyperparameters values with much fewer runs than random search. We propose a particularly simple LDS method which can be used as a drop-in replacement for grid or random search in any Deep Learning pipeline, both as a fully one-shot hyperparameter search or as an initializer in iterative batch optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes