LGMLOct 13, 2018

A System for Massively Parallel Hyperparameter Tuning

arXiv:1810.05934v5513 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficiently tuning hyperparameters for machine learning practitioners in production settings, representing an incremental improvement with practical integration.

The paper tackles the challenge of hyperparameter optimization in distributed computing by introducing ASHA, a simple and robust algorithm that uses parallelism and early-stopping, showing it outperforms state-of-the-art methods, scales linearly with workers, and handles up to 500 workers.

Modern learning models are characterized by large hyperparameter spaces and long training times. These properties, coupled with the rise of parallel computing and the growing demand to productionize machine learning workloads, motivate the need to develop mature hyperparameter optimization functionality in distributed computing settings. We address this challenge by first introducing a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameter optimization problems. Our extensive empirical results show that ASHA outperforms existing state-of-the-art hyperparameter optimization methods; scales linearly with the number of workers in distributed settings; and is suitable for massive parallelism, as demonstrated on a task with 500 workers. We then describe several design decisions we encountered, along with our associated solutions, when integrating ASHA in Determined AI's end-to-end production-quality machine learning system that offers hyperparameter tuning as a service.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes