SENov 12, 2020
A Fine-grained Data Set and Analysis of Tangling in Bug Fixing CommitsSteffen Herbold, Alexander Trautsch, Benjamin Ledel et al.
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
SEAug 15, 2019
How does Object-Oriented Code Refactoring Influence Software Quality? Research Landscape and ChallengesSatnam Kaur, Paramvir Singh
Context: Software refactoring aims to improve software quality and developer productivity. Numerous empirical studies investigating the impact of refactoring activities on software quality have been conducted over the last two decades. Objective: This study aims to perform a comprehensive systematic mapping study of existing empirical studies on evaluation of the effect of object-oriented code refactoring activities on software quality attributes. Method: We followed a multi-stage scrutinizing process to select 142 primary studies published till December 2017. The selected primary studies were further classified based on several aspects to answer the research questions defined for this work. In addition, we applied vote-counting approach to combine the empirical results and their analysis reported in primary studies. Results: The findings indicate that studies conducted in academic settings found more positive impact of refactoring on software quality than studies performed in industries. In general, refactoring activities caused all quality attributes to improve or degrade except for cohesion, complexity, inheritance, fault-proneness and power consumption attributes. Furthermore, individual refactoring activities have variable effects on most quality attributes explored in primary studies, indicating that refactoring does not always improve all quality attributes. Conclusions: This study points out several open issues which require further investigation, e.g., lack of industrial validation, lesser coverage of refactoring activities, limited tool support, etc.