SEMar 22, 2021Code
Automated Issue Assignment: Results and Insights from an Industrial CaseEthem Utku Aktas, Cemal Yilmaz
Softtech, being a subsidiary of the largest private bank in Turkey, called IsBank, receives an average of 350 issue reports from the field every day. Manually assigning the reported issues to the software development teams is costly and cumbersome. We automate the issue assignments using data mining approaches and share our experience gained by deploying the resulting system at Softtech/IsBank. Automated issue assignment has been studied in the literature. However, most of these works report the results obtained on open source projects and the remaining few, although they use commercial, closed source projects, carry out the assignments in a retrospective manner. We, on the other hand, deploy the proposed approach, which has been making all the assignments since Jan 12, 2018. This presents us with an unprecedented opportunity to observe the practical effects of automated issue assignment in the field and to carry out user studies, which have not been done before in this context. We observe that it is not just about deploying a system for automated issue assignment, but also about designing/changing the assignment process around the system; the accuracy of the assignments does not have to be higher than that of manual assignments in order for the system to be useful; deploying such a system requires the development of additional functionalities, such as detecting deteriorations in assignment accuracies in an online manner and creating human-readable explanations for the assignments; stakeholders do not necessarily resist change; and gradual transition can help stakeholders build confidence.
SENov 12, 2020Code
A Fine-grained Data Set and Analysis of Tangling in Bug Fixing CommitsSteffen Herbold, Alexander Trautsch, Benjamin Ledel et al.
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.