SE IRJun 20, 2018

The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization

Chakkrit Tantithamthavorn, Surafel Lemma Abebe, Ahmed E. Hassan, Akinori Ihara, Kenichi Matsumoto

arXiv:1806.07727v18.229 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of optimizing bug localization for software developers, but it is incremental as it focuses on configuration analysis rather than introducing a new method.

The study investigated how different configurations of IR-based classifiers affect performance and effort in method-level bug localization, finding that configuration choice impacts top-k performance from 0.44% to 36% and effort from 4,395 to 50,000 LOC, with VSM achieving the best results.

Context: IR-based bug localization is a classifier that assists developers in locating buggy source code entities (e.g., files and methods) based on the content of a bug report. Such IR-based classifiers have various parameters that can be configured differently (e.g., the choice of entity representation). Objective: In this paper, we investigate the impact of the choice of the IR-based classifier configuration on the top-k performance and the required effort to examine source code entities before locating a bug at the method level. Method: We execute a large space of classifier configuration, 3,172 in total, on 5,266 bug reports of two software systems, i.e., Eclipse and Mozilla. Results: We find that (1) the choice of classifier configuration impacts the top-k performance from 0.44% to 36% and the required effort from 4,395 to 50,000 LOC; (2) classifier configurations with similar top-k performance might require different efforts; (3) VSM achieves both the best top-k performance and the least required effort for method-level bug localization; (4) the likelihood of randomly picking a configuration that performs within 20% of the best top-k classifier configuration is on average 5.4% and that of the least effort is on average 1%; (5) configurations related to the entity representation of the analyzed data have the most impact on both the top-k performance and the required effort; and (6) the most efficient classifier configuration obtained at the method-level can also be used at the file-level (and vice versa). Conclusion: Our results lead us to conclude that configuration has a large impact on both the top-k performance and the required effort for method-level bug localization, suggesting that the IR-based configuration settings should be carefully selected and the required effort metric should be included in future bug localization studies.

View on arXiv PDF Code

Similar