Successive Halving with Learning Curve Prediction via Latent Kronecker Gaussian Processes
This work addresses hyperparameter optimization for machine learning practitioners, but it is incremental as it builds on existing methods without a major breakthrough.
The paper tackled the problem of Successive Halving prematurely pruning slow-starting hyperparameter candidates by using learning curve predictions via Latent Kronecker Gaussian Processes, finding that this predictive approach achieved competitive performance but was not Pareto optimal compared to investing more resources into the standard approach.
Successive Halving is a popular algorithm for hyperparameter optimization which allocates exponentially more resources to promising candidates. However, the algorithm typically relies on intermediate performance values to make resource allocation decisions, which can cause it to prematurely prune slow starters that would eventually become the best candidate. We investigate whether guiding Successive Halving with learning curve predictions based on Latent Kronecker Gaussian Processes can overcome this limitation. In a large-scale empirical study involving different neural network architectures and a click prediction dataset, we compare this predictive approach to the standard approach based on current performance values. Our experiments show that, although the predictive approach achieves competitive performance, it is not Pareto optimal compared to investing more resources into the standard approach, because it requires fully observed learning curves as training data. However, this downside could be mitigated by leveraging existing learning curve data.