Is Hyper-Parameter Optimization Different for Software Analytics?
This addresses the need for tailored hyper-parameter optimizers in software analytics, offering incremental improvements for researchers and practitioners in SE.
The paper tackles the problem of hyper-parameter optimization for software engineering data, showing that SE data has smoother loss function boundaries, and introduces SMOOTHIE, which runs faster and predicts better on SE tasks like GitHub issue lifetime prediction, false alarm detection, and defect prediction, while tying on non-SE data.
Yes. SE data can have "smoother" boundaries between classes (compared to traditional AI data sets). To be more precise, the magnitude of the second derivative of the loss function found in SE data is typically much smaller. A new hyper-parameter optimizer, called SMOOTHIE, can exploit this idiosyncrasy of SE data. We compare SMOOTHIE and a state-of-the-art AI hyper-parameter optimizer on three tasks: (a) GitHub issue lifetime prediction (b) detecting static code warnings false alarm; (c) defect prediction. For completeness, we also show experiments on some standard AI datasets. SMOOTHIE runs faster and predicts better on the SE data--but ties on non-SE data with the AI tool. Hence we conclude that SE data can be different to other kinds of data; and those differences mean that we should use different kinds of algorithms for our data. To support open science and other researchers working in this area, all our scripts and datasets are available on-line at https://github.com/yrahul3910/smoothness-hpo/.