SELGJan 17, 2024

Is Hyper-Parameter Optimization Different for Software Analytics?

arXiv:2401.09622v42 citationsh-index: 6Has CodeIEEE Trans Softw Eng
Originality Incremental advance
AI Analysis

This addresses the need for tailored hyper-parameter optimizers in software analytics, offering incremental improvements for researchers and practitioners in SE.

The paper tackles the problem of hyper-parameter optimization for software engineering data, showing that SE data has smoother loss function boundaries, and introduces SMOOTHIE, which runs faster and predicts better on SE tasks like GitHub issue lifetime prediction, false alarm detection, and defect prediction, while tying on non-SE data.

Yes. SE data can have "smoother" boundaries between classes (compared to traditional AI data sets). To be more precise, the magnitude of the second derivative of the loss function found in SE data is typically much smaller. A new hyper-parameter optimizer, called SMOOTHIE, can exploit this idiosyncrasy of SE data. We compare SMOOTHIE and a state-of-the-art AI hyper-parameter optimizer on three tasks: (a) GitHub issue lifetime prediction (b) detecting static code warnings false alarm; (c) defect prediction. For completeness, we also show experiments on some standard AI datasets. SMOOTHIE runs faster and predicts better on the SE data--but ties on non-SE data with the AI tool. Hence we conclude that SE data can be different to other kinds of data; and those differences mean that we should use different kinds of algorithms for our data. To support open science and other researchers working in this area, all our scripts and datasets are available on-line at https://github.com/yrahul3910/smoothness-hpo/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes