SEJun 8, 2021
Does class size matter? An in-depth assessment of the effect of class size in software defect predictionAmjed Tahir, Kwabena E. Bennin, Xun Xiao et al.
In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have shown that size cannot be simply controlled or ignored, when building prediction models. As such, there remains a question whether, and when, to control for class size. This study provides a new in-depth examination of the impact of class size on the relationship between OO metrics and software defects or defect-proneness. We assess the impact of class size on the number of defects and defect-proneness in software systems by employing a regression-based mediation (with bootstrapping) and moderation analysis to investigate the direct and indirect effect of class size in count and binary defect prediction. Our results show that the size effect is not always significant for all metrics. Of the seven OO metrics we investigated, size consistently has significant mediation impact only on the relationship between Coupling Between Objects (CBO) and defects/defect-proneness, and a potential moderation impact on the relationship between Fan-out and defects/defect-proneness. Based on our results we make three recommendations. One, we encourage researchers and practitioners to examine the impact of class size for the specific data they have in hand and through the use of the proposed statistical mediation/moderation procedures. Two, we encourage empirical studies to investigate the indirect effect of possible additional variables in their models when relevant. Three, the statistical procedures adopted in this study could be used in other empirical software engineering research to investigate the influence of potential mediators/moderators.
SEMay 29, 2021
Investigating the Significance of Bellwether Effect to Improve Software Effort EstimationSolomon Mensah, Jacky Keung, Stephen G. MacDonell et al.
Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the Bellwether effect on software effort estimation accuracy using moving windows. The existence of the Bellwether was empirically proven based on six postulations. We apply statistical stratification and Markov chain methodology to select the Bellwether moving window. The resulting Bellwether moving window is used to predict the software effort of a new project. Empirical results show that Bellwether effect exist in chronological datasets with a set of exemplary and recently completed projects representing the Bellwether moving window. Result from this study has shown that the use of Bellwether moving window with the Gaussian weighting function significantly improve the prediction accuracy.
SEApr 26, 2021
Revisiting the size effect in software fault prediction modelsAmjed Tahir, Kwabena E. Bennin, Stephen G. MacDonell et al.
BACKGROUND: In object oriented (OO) software systems, class size has been acknowledged as having an indirect effect on the relationship between certain artifact characteristics, captured via metrics, and faultproneness, and therefore it is recommended to control for size when designing fault prediction models. AIM: To use robust statistical methods to assess whether there is evidence of any true effect of class size on fault prediction models. METHOD: We examine the potential mediation and moderation effects of class size on the relationships between OO metrics and number of faults. We employ regression analysis and bootstrapping-based methods to investigate the mediation and moderation effects in two widely-used datasets comprising seventeen systems. RESULTS: We find no strong evidence of a significant mediation or moderation effect of class size on the relationships between OO metrics and faults. In particular, size appears to have a more significant mediation effect on CBO and Fan-out than other metrics, although the evidence is not consistent in all examined systems. On the other hand, size does appear to have a significant moderation effect on WMC and CBO in most of the systems examined. Again, the evidence provided is not consistent across all examined systems CONCLUSION: We are unable to confirm if class size has a significant mediation or moderation effect on the relationships between OO metrics and the number of faults. We contend that class size does not fully explain the relationships between OO metrics and the number of faults, and it does not always affect the strength/magnitude of these relationships. We recommend that researchers consider the potential mediation and moderation effect of class size when building their prediction models, but this should be examined independently for each system.