Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization
This addresses the problem of inefficient hyperparameter optimization for researchers and practitioners using large-scale computing resources, representing a novel method for a known bottleneck rather than an incremental improvement.
The paper tackles the scalability limitations of centralized Bayesian optimization for hyperparameter tuning in deep neural networks by proposing an asynchronous-decentralized method, achieving over 95% worker utilization on 1,920 parallel workers and demonstrating improved model accuracy and faster convergence on the CANDLE benchmark.
Bayesian optimization (BO) is a promising approach for hyperparameter optimization of deep neural networks (DNNs), where each model training can take minutes to hours. In BO, a computationally cheap surrogate model is employed to learn the relationship between parameter configurations and their performance such as accuracy. Parallel BO methods often adopt single manager/multiple workers strategies to evaluate multiple hyperparameter configurations simultaneously. Despite significant hyperparameter evaluation time, the overhead in such centralized schemes prevents these methods to scale on a large number of workers. We present an asynchronous-decentralized BO, wherein each worker runs a sequential BO and asynchronously communicates its results through shared storage. We scale our method without loss of computational efficiency with above 95% of worker's utilization to 1,920 parallel workers (full production queue of the Polaris supercomputer) and demonstrate improvement in model accuracy as well as faster convergence on the CANDLE benchmark from the Exascale computing project.