LGMLNov 6, 2018

Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates

arXiv:1811.02319v3
Originality Incremental advance
AI Analysis

This addresses the efficiency bottleneck in hyperparameter tuning for deep learning practitioners, though it is incremental as it builds on existing Bayesian optimization methods.

The paper tackles the problem of insufficient evaluation data in hyperparameter optimization for deep neural networks by proposing HOIST, a method that uses both complete and intermediate evaluation data, and it outperforms state-of-the-art approaches across various DNN types.

The performance of deep neural networks crucially depends on good hyperparameter configurations. Bayesian optimization is a powerful framework for optimizing the hyperparameters of DNNs. These methods need sufficient evaluation data to approximate and minimize the validation error function of hyperparameters. However, the expensive evaluation cost of DNNs leads to very few evaluation data within a limited time, which greatly reduces the efficiency of Bayesian optimization. Besides, the previous researches focus on using the complete evaluation data to conduct Bayesian optimization, and ignore the intermediate evaluation data generated by early stopping methods. To alleviate the insufficient evaluation data problem, we propose a fast hyperparameter optimization method, HOIST, that utilizes both the complete and intermediate evaluation data to accelerate the hyperparameter optimization of DNNs. Specifically, we train multiple basic surrogates to gather information from the mixed evaluation data, and then combine all basic surrogates using weighted bagging to provide an accurate ensemble surrogate. Our empirical studies show that HOIST outperforms the state-of-the-art approaches on a wide range of DNNs, including feed forward neural networks, convolutional neural networks, recurrent neural networks, and variational autoencoder.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes