Is One Hyperparameter Optimizer Enough?
This work addresses the problem of selecting hyperparameter optimizers for software engineers, revealing that current methods are incremental and not universally effective.
The paper investigated the effectiveness of various hyperparameter optimizers for defect prediction in software analytics, finding that no single optimizer consistently performed best and that hyperparameter optimization did not improve over default configurations in 50% of cases for F-measure.
Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be `best' and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.