67.4SEMay 17
Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug ReportsMahmut Furkan Gon, Emre Dinc, Tevfik Emre Sungur et al.
Issues faced when using software are reported in the form of bug reports. However, many bug reports are invalid, meaning they do not require code changes, and are resolved with a no-code fix. Manually determining the root cause of the invalid bug reports and providing actionable resolutions by the customer support causes a serious waste of resources. Our goal is to introduce a standardized taxonomy for root-cause oriented invalid bug report subclassification, and perform experiments to test the accuracy of various approaches on invalid subclassification and no-code fix generation. We study how different configurations perform on a gold-standard benchmark we have created. Using a manually curated benchmark for higher quality analysis, we experimented with vanilla LLMs, Retrieval Augmented Generation, and agentic web search to identify invalid subclasses and generate no-code fixes. We evaluated the results against manually labeled ground truth data that includes the invalid subclass and no-code fixes from the original bug reports. We measured subclass detection performance with weighted F1-Score, and assessed no-code fix suggestions using BERTScore and Judge LLM success rates. For subclassification, retrieval augmented generation achieves the highest overall performance with 0.66 weighted F1, slightly outperforming vanilla LLMs at 0.65 and agentic web search at 0.64. At the subclass level, performance peaks at 0.85 F1 for Non-reproducibility and 0.79 for Feature Request and Question, while Wrong Version remains the most challenging with scores between 0.00 and 0.29. For no-code fix generation, agentic web search achieves the highest overall Judge LLM success rate at 68.9%, compared to 64.4% for RAG applications and 64.9% for vanilla LLMs, with subclass-level peaks of 87.4% for Working as Designed and 72.2% for Question.
SENov 12, 2020
A Fine-grained Data Set and Analysis of Tangling in Bug Fixing CommitsSteffen Herbold, Alexander Trautsch, Benjamin Ledel et al.
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
CYMay 22, 2018
Are Computer Science and Engineering Graduates Ready for the Software Industry? Experiences from an Industrial Student Training ProgramEray Tuzun, Hakan Erdogmus, Izzet Gokhan Ozbilgin
It has been 50 years since the term software engineering was coined in 1968 at a NATO conference. The field should be relatively mature by now, with most established universities covering core software engineering topics in their Computer Science programs and others offering specialized degrees. However, still many practitioners lament a lack of skills in new software engineering hires. With the growing demand for software engineers from the industry, this apparent gap becomes more and more pronounced. One corporate strategy to address this gap is for the industry to develop supplementary training programs before the hiring process, which could also help them screen viable candidates. In this paper, we report on our experiences and lessons learned in conducting a summer school program aimed at screening new graduates, introducing them to core skills relevant to the organization and the industry, and assessing their attitudes toward mastering those skills before the hiring process begins. Our experience suggests that such initiatives can be mutually beneficial for new hires and companies. We support this insight with pre- and post-training data collected from the participants during the first edition of such a summer school and a follow-up questionnaire conducted after a year with the graduates, 50% of whom was hired by the company shortly after the summer school.