LGAug 5, 2024

LMEMs for post-hoc analysis of HPO Benchmarking

Anton Geburek, Neeratyoy Mallik, Danny Stoll, Xavier Bouthillier, Frank Hutter

arXiv:2408.02533v14.62 citationsh-index: 72Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the issue of unclear method comparisons in HPO benchmarking for researchers, though it is incremental as it applies an existing statistical method to a new context.

The paper tackles the problem of obscured differences in hyperparameter optimization (HPO) benchmarking by applying Linear Mixed-Effect Models (LMEMs) for post-hoc analysis, demonstrating it on PriorBand data to uncover unreported insights.

The importance of tuning hyperparameters in Machine Learning (ML) and Deep Learning (DL) is established through empirical research and applications, evident from the increase in new hyperparameter optimization (HPO) algorithms and benchmarks steadily added by the community. However, current benchmarking practices using averaged performance across many datasets may obscure key differences between HPO methods, especially for pairwise comparisons. In this work, we apply Linear Mixed-Effect Models-based (LMEMs) significance testing for post-hoc analysis of HPO benchmarking runs. LMEMs allow flexible and expressive modeling on the entire experiment data, including information such as benchmark meta-features, offering deeper insights than current analysis practices. We demonstrate this through a case study on the PriorBand paper's experiment data to find insights not reported in the original work.

View on arXiv PDF Code

Similar